MongoDB Aggregation Framework - Working With Arrays
1. Introduction
MongoDB Aggregation Framework comprises of a Pipeline, Stages, and Operators. It is a data processing pipeline, where data is processed in stages, each stage performing different operation. Operators are applied to data at each stage for calculation, transformation, and shaping the output.
The aggregation query has the syntax db.collection.aggregate( [ pipeline stage1, stage 2, ..., stage n ] )
, and usually returns document(s) as output. These queries are used for querying, reporting, or creating a transformed intermediate collection for further processing. As of MongoDB v4.2, a pipeline can also be used for updating a collection.
We will discuss both the querying and updating with example code, using operators for working with array type fields.
Aggregation Framework has operators to work with array fields of a MongoDB collection document. There are two categories of array operators:
- Array Expression Operators
- Set Expression Operators
In this article, we will use the Array Expression Operators using an example. There are about 15 operators in this category (as of MongoDB v4.2), and we will try some of them.
2. The Example Data
An example of data is the Order and Order Line Items. In a normalized form, these two entities result in model of two separate data entities. With MongoDB's flexible schema, this One-to-Many relationship results in a model as a single collection. Note this 1-N relationship clearly specifies that an Order can have at most 100 maximum line items. The data can be structured as follows:
orders:
_id:
reference:
date:
status:
customer:
lines: [ 0.. 100 ]
The lines
field has an array of Order Line Items, with each line as a sub-document (or embedded document) with the following attributes:
no:
item:
quantity:
price:
Working from the mongo
shell, we will start with creating a database called as orders
and a collection called orderData
:
> use orders
> let order =
{
_id: 1,
reference: "ORD-1",
date: ISODate(),
customer: "Database User",
status: "NEW",
lines: [
{ item: "ABC", quantity: 4, price: 10.5 },
{ item: "XYZ", quantity: 1, price: 88.0 },
{ item: "123", quantity: 30, price: 1.99 }
]
}
> db.orderData.insertOne(order)
The above commands create the database and the collection with one document. Query the collection:
> db.orderData.find().pretty()
3. Aggregation Queries
We will code some queries applying the Aggregation Array Operators on the newly created collection document's Order Lines array field lines
.
3.1. Count The Number Of Line Items For Each Order
The $size
array operator returns the length of an array.
db.orderData.aggregate([
{
$project: { linesSize: { "$size": "$lines" } }
}
])
The output:
{ "_id" : 1, "linesSize" : 3 }
In the above aggregation, the pipeline has one stage only, the $project
. The $project
allows field inclusion, exclusion, and also include new derived fields.
3.2. Is The Line Items Field Of Type Array?
The $isArray
returns a boolean true
if the field type is an "array".
db.orderData.aggregate([
{
$project: { linesIsArray: { "$isArray": "$lines" } }
}
])
The output:
{ "_id" : 1, "linesIsArray" : true }
3.3. Using Two Array Operators ($size and $isArray) Together
First, we will add another document to the orderData
collection; this one without any line items:
{
"_id" : 2,
"reference" : "ORD-2",
"date" : ISODate("2020-07-13T16:31:26.459Z"),
"customer" : "SQL User",
"status" : "NEW"
}
The query and the output:
db.orderData.aggregate([
{
$project: {
linesSize: {
$cond: [
{ "$isArray": "$lines" },
{ "$size": "$lines" },
0
]
}
}
}
])
{ "_id" : 1, "linesSize" : 3 }
{ "_id" : 2, "linesSize" : 0 }
In this query, in addition to the $size
and $isArray
operators, we used a conditional operator. $cond
is used to make sure the field is an array and get the length, if not return a zero length. Note that if you try to use the $size
on a non-existing field, there will be an error.
4. Iterating The Arrays
Iterating an array and doing some operations like filtering, mapping, and reduction are common on array elements. You use a filter is to match and retain some values only, and do mapping to transform each array element, and reduce to find the sum, average, etc., of array elements. There are array operators to perform similar tasks in the aggregation framework: $filter
, $map
, and $reduce
.
4.1. Find Line Items With Price Greater Than N
The $filter
operator lets iterate over the array elements and apply a specified condition. The result has the array elements which match the condition.
db.orderData.aggregate([
{
$match: { lines: { $type: "array" } }
},
{
$addFields: {
lines: {
$filter: {
input: "$lines",
as: "line",
cond: { $gt: [ "$$line.price" , 50.0 ] }
}
}
}
}
])
The output:
... "lines" : [
{
"item" : "XYZ",
"quantity" : 1,
"price" : 88
}
]
The $addFields
stage is a variation of the $project
, mostly used to add new transformed or derived fields to the document. The first stage of the pipeline, $match
, passes the documents with the lines
as an array field to the next pipeline stage.
4.2. Calculate Total Amount Of All Line Items
The $reduce
array operator lets iterate over the array elements and does some custom accumulation, like in this case sum of all line amounts.
db.orderData.aggregate([
{
$project: {
total_line_amount: {
$reduce: {
input: { $ifNull: [ "$lines", [] ] },
initialValue: 0,
in: {
$add: [
"$$value",
{ $multiply: [ "$$this.price", "$$this.quantity" ] }
]
}
}
}
}
}
])
The aggregation returns the sum of all line item values for both the documents as:
{ "_id" : 1, "total_line_amount" : 189.7 }
{ "_id" : 2, "total_line_amount" : 0 }
The $ifNull
operator is used to check if a field does not exist or is null
. In such a case, substitutes an empty array. Using the $ifNull
, the order line with _id 2
is introduced with lines: [ ]
(an array field with zero elements) for this aggregation.
4.3. Calculate A New Discount Field For Each Of Line Items
The $map
array operator is used to transform each array element; in this case, introduce a new field discount
. The discount is assigned as 0.99
or 1.99
, depending upon the line amount (price * quantity
).
db.orderData.aggregate([
{
$match: { _id: 1 }
},
{
$addFields: {
lines: {
$map: {
input: "$lines",
as: "line",
in: {
$mergeObjects: [
"$$line",
{ discount: {
$cond: [
{ $gt: [ { $multiply: [ "$$line.price", "$$line.quantity" ] }, 50.0 ] },
1.99,
0.99
]
} }
]
}
}
}
}
}
])
The output:
..."lines" : [
{
"item" : "ABC",
"quantity" : 4,
"price" : 10.5,
"discount" : 0.99
},
{
"item" : "XYZ",
"quantity" : 1,
"price" : 88,
"discount" : 1.99
},
...
5. Updates with Aggregation Pipeline
Starting MongoDB version 4.2, you can use an Aggregation Pipeline for update operations.
This feature is useful in some situations, as the update can take more than one operation; typically, a read, followed by transforming the data and then the update (in which the Aggregation query is used with the reading and transform operations). This new feature allows all these operations performed atomically.
As an example, calculate tax for each order line item and update the document with the tax amount. The tax value is three percent of the line item value for orders which have the status
as "NEW".
db.orderData.updateMany(
{ status: "NEW", lines: { $exists: true } },
[
{
$set: {
lines: {
$map: {
input: "$lines",
as: "line",
in: {
$mergeObjects: [
"$$line",
{ tax: { $multiply: [ 0.03, "$$line.quantity", "$$line.price" ] } }
]
}
}
}
}
}
]
)
Query the document to see the updated document with the lines
showing the new tax
field:
db.orderData.findOne( { _id: 1 } )
...
"lines" : [
{
"item" : "ABC",
"quantity" : 4,
"price" : 10.5,
"tax" : 1.26 <----- the newly added field
}, ...
6. Useful Links
- Aggregation Pipeline Quick Reference
- Aggregation Pipeline
- Array Expression Operators
- Updates with Aggregation Pipeline
The code used in the examples can be accessed at the GitHub gist
Great Information . https://modapktv.com/
Your samples were really helpful. Using $map and $filter in conjunction with other things, like $mergeObjects helped me to implement my pipeline without using $unwind stage. Thank you for great tips!
Great!