Codementor Events

MongoDB Aggregation Framework - Working With Arrays

Published Jul 22, 2020Last updated Jan 18, 2021
MongoDB Aggregation Framework - Working With Arrays

1. Introduction

MongoDB Aggregation Framework comprises of a Pipeline, Stages, and Operators. It is a data processing pipeline, where data is processed in stages, each stage performing different operation. Operators are applied to data at each stage for calculation, transformation, and shaping the output.

The aggregation query has the syntax db.collection.aggregate( [ pipeline stage1, stage 2, ..., stage n ] ), and usually returns document(s) as output. These queries are used for querying, reporting, or creating a transformed intermediate collection for further processing. As of MongoDB v4.2, a pipeline can also be used for updating a collection.

We will discuss both the querying and updating with example code, using operators for working with array type fields.

Aggregation Framework has operators to work with array fields of a MongoDB collection document. There are two categories of array operators:

  • Array Expression Operators
  • Set Expression Operators

In this article, we will use the Array Expression Operators using an example. There are about 15 operators in this category (as of MongoDB v4.2), and we will try some of them.


2. The Example Data

An example of data is the Order and Order Line Items. In a normalized form, these two entities result in model of two separate data entities. With MongoDB's flexible schema, this One-to-Many relationship results in a model as a single collection. Note this 1-N relationship clearly specifies that an Order can have at most 100 maximum line items. The data can be structured as follows:

orders:
  _id:
  reference:
  date:
  status:
  customer:
  lines: [ 0.. 100 ]

The lines field has an array of Order Line Items, with each line as a sub-document (or embedded document) with the following attributes:

no:
item:
quantity:
price:

Working from the mongo shell, we will start with creating a database called as orders and a collection called orderData:

> use orders
> let order =
{
    _id: 1,
    reference: "ORD-1",
    date: ISODate(),
    customer: "Database User",
    status: "NEW",
    lines: [
        { item: "ABC", quantity: 4, price: 10.5 },
        { item: "XYZ", quantity: 1, price: 88.0 },
        { item: "123", quantity: 30, price: 1.99 }
    ]
}

> db.orderData.insertOne(order)

The above commands create the database and the collection with one document. Query the collection:

> db.orderData.find().pretty()

3. Aggregation Queries

We will code some queries applying the Aggregation Array Operators on the newly created collection document's Order Lines array field lines.

3.1. Count The Number Of Line Items For Each Order

The $size array operator returns the length of an array.

db.orderData.aggregate([ 
  { 
      $project: { linesSize: { "$size": "$lines" } } 
  } 
])

The output: { "_id" : 1, "linesSize" : 3 }

In the above aggregation, the pipeline has one stage only, the $project. The $project allows field inclusion, exclusion, and also include new derived fields.

3.2. Is The Line Items Field Of Type Array?

The $isArray returns a boolean true if the field type is an "array".

db.orderData.aggregate([ 
  { 
      $project: { linesIsArray: { "$isArray": "$lines" } } 
  } 
])

The output: { "_id" : 1, "linesIsArray" : true }

3.3. Using Two Array Operators ($size and $isArray) Together

First, we will add another document to the orderData collection; this one without any line items:

{
        "_id" : 2,
        "reference" : "ORD-2",
        "date" : ISODate("2020-07-13T16:31:26.459Z"),
        "customer" : "SQL User",
        "status" : "NEW"
}

The query and the output:

db.orderData.aggregate([ 
  { 
      $project: { 
          linesSize: { 
              $cond: [ 
                  { "$isArray": "$lines" }, 
                  { "$size": "$lines" }, 
                  0 
              ]  
          } 
      }  
  } 
])

{ "_id" : 1, "linesSize" : 3 }
{ "_id" : 2, "linesSize" : 0 }

In this query, in addition to the $size and $isArray operators, we used a conditional operator. $cond is used to make sure the field is an array and get the length, if not return a zero length. Note that if you try to use the $size on a non-existing field, there will be an error.


4. Iterating The Arrays

Iterating an array and doing some operations like filtering, mapping, and reduction are common on array elements. You use a filter is to match and retain some values only, and do mapping to transform each array element, and reduce to find the sum, average, etc., of array elements. There are array operators to perform similar tasks in the aggregation framework: $filter, $map, and $reduce.

4.1. Find Line Items With Price Greater Than N

The $filter operator lets iterate over the array elements and apply a specified condition. The result has the array elements which match the condition.

db.orderData.aggregate([
  { 
      $match: { lines: { $type: "array" } } 
  },
  { 
      $addFields: { 
          lines: { 
              $filter: { 
                  input: "$lines", 
                  as: "line", 
                  cond: { $gt: [ "$$line.price" ,  50.0 ] }
              }
          }
      }
  }
])

The output:

         ... "lines" : [
                 {
                         "item" : "XYZ",
                         "quantity" : 1,
                         "price" : 88
                 }
         ]

The $addFields stage is a variation of the $project, mostly used to add new transformed or derived fields to the document. The first stage of the pipeline, $match, passes the documents with the lines as an array field to the next pipeline stage.

4.2. Calculate Total Amount Of All Line Items

The $reduce array operator lets iterate over the array elements and does some custom accumulation, like in this case sum of all line amounts.

db.orderData.aggregate([
  { 
      $project: { 
          total_line_amount: { 
              $reduce: { 
                  input: { $ifNull: [ "$lines", [] ] }, 
                  initialValue: 0, 
                  in: { 
                      $add: [ 
                          "$$value", 
                          { $multiply: [ "$$this.price", "$$this.quantity" ] } 
                      ] 
                  }
              }
          }
      }
  }
])

The aggregation returns the sum of all line item values for both the documents as:

{ "_id" : 1, "total_line_amount" : 189.7 }
{ "_id" : 2, "total_line_amount" : 0 }

The $ifNull operator is used to check if a field does not exist or is null. In such a case, substitutes an empty array. Using the $ifNull, the order line with _id 2 is introduced with lines: [ ] (an array field with zero elements) for this aggregation.

4.3. Calculate A New Discount Field For Each Of Line Items

The $map array operator is used to transform each array element; in this case, introduce a new field discount. The discount is assigned as 0.99 or 1.99, depending upon the line amount (price * quantity).

db.orderData.aggregate([
  { 
      $match: { _id: 1 }
  },
  { 
      $addFields: { 
          lines: { 
              $map: { 
                  input: "$lines", 
                  as: "line", 
                  in: {
                      $mergeObjects: [ 
                          "$$line",  
                          { discount: { 
                               $cond: [ 
                                   { $gt: [ { $multiply: [ "$$line.price", "$$line.quantity" ] },  50.0 ] }, 
                                    1.99, 
                                    0.99 
                               ]  
                          } }
                      ]
                  }
              }
          }
      }
  }
])

The output:

         ..."lines" : [
                 {
                         "item" : "ABC",
                         "quantity" : 4,
                         "price" : 10.5,
                         "discount" : 0.99
                 },
                 {
                         "item" : "XYZ",
                         "quantity" : 1,
                         "price" : 88,
                         "discount" : 1.99
                 },
                 ...

5. Updates with Aggregation Pipeline

Starting MongoDB version 4.2, you can use an Aggregation Pipeline for update operations.

This feature is useful in some situations, as the update can take more than one operation; typically, a read, followed by transforming the data and then the update (in which the Aggregation query is used with the reading and transform operations). This new feature allows all these operations performed atomically.

As an example, calculate tax for each order line item and update the document with the tax amount. The tax value is three percent of the line item value for orders which have the status as "NEW".

db.orderData.updateMany(
  { status: "NEW",  lines: { $exists: true } },
  [
    { 
        $set: { 
            lines: { 
                $map: { 
                    input: "$lines", 
                    as: "line", 
                    in: {
                        $mergeObjects: [ 
                           "$$line",  
                            { tax: { $multiply: [ 0.03, "$$line.quantity", "$$line.price" ]  } }
                        ]
                    }
                }
            }
        }
    }
  ]
)

Query the document to see the updated document with the lines showing the new tax field:

db.orderData.findOne( { _id: 1 } )
        ...
        "lines" : [
                {
                        "item" : "ABC",
                        "quantity" : 4,
                        "price" : 10.5,
                        "tax" : 1.26        <----- the newly added field
                }, ...


6. Useful Links

The code used in the examples can be accessed at the GitHub gist

Discover and read more posts from Prasad Saya
get started
post comments4Replies
Muhammad Akhtar Assi
a year ago

Great Information . https://modapktv.com/

Viacheslav Garmash
2 years ago

Your samples were really helpful. Using $map and $filter in conjunction with other things, like $mergeObjects helped me to implement my pipeline without using $unwind stage. Thank you for great tips!

Peter Blackwell from Email to Text
4 years ago

Great!

Show more replies