MongoDB – Aggregation

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on…

Aggregation operations perform data record processing and return computed results. Aggregation operations combine values from multiple documents into a single result and can perform a variety of operations on the grouped data. In SQL, count(*) and group by are equivalent to MongoDB aggregation.

The aggregate() Method

For the aggregation in MongoDB, you should use aggregate() method.

Syntax

The basic syntax of aggregate() method is as follows –

>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

Example

In the collection, you have the following data −

{
   _id: ObjectId(7df78ad8902c)
   title: 'MongoDB Overview', 
   description: 'MongoDB is no sql database',
   by_user: 'phptpoint',
   url: 'https://www.codelivly.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
},
{
   _id: ObjectId(7df78ad8902d)
   title: 'NoSQL Overview', 
   description: 'No sql database is very fast',
   by_user: 'phptpoint',
   url: 'https://www.codelivly.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 10
},
{
   _id: ObjectId(7df78ad8902e)
   title: 'Neo4j Overview', 
   description: 'Neo4j is no sql database',
   by_user: 'Neo4j',
   url: 'http://www.neo4j.com',
   tags: ['neo4j', 'database', 'NoSQL'],
   likes: 750
},

If you want to show a list of how many tutorials are written by each user from the above set, then you can use the following aggregate() method –

> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{ "_id" : "phptpoint", "num_tutorial" : 2 }
{ "_id" : "Neo4j", "num_tutorial" : 1 }
>

SQL equivalent query for the above use case will be select by_user, count(*) from mycol group by by_user.
In the preceding example, we grouped documents by the field by the user, and the previous value of the sum is incremented on each occurrence by the user. The following is a list of aggregation expressions. 

EXPRESSIONDESCRIPTIONEXAMPLE
$sumSummarizes the defined value across all documents in the collection.db.empDetails.aggregate([{$group : {_id : “$jobTitleName”, num_tutorial : {$sum : “$likes”}}}])
$avgCalculates the average of all given values from all documents in the collection.db.empDetails.aggregate([{$group : {_id : “$jobTitleName”, num_tutorial : {$avg : “$likes”}}}])
$minGets the minimum of the corresponding values from all documents in the collection.db.empDetails.aggregate([{$group : {_id : “$jobTitleName”, num_tutorial : {$min : “$likes”}}}])
$maxGets the maximum of the corresponding values from all documents in the collection.db.empDetails.aggregate([{$group : {_id : “$jobTitleName”, num_tutorial : {$max : “$likes”}}}])
$pushInserts the value to an array in the resulting document.db.empDetails.aggregate([{$group : {_id : “$jobTitleName”, url : {$push: “$url”}}}])
$addToSetInserts the value to an array in the resulting document but does not create duplicates.db.empDetails.aggregate([{$group : {_id : “$jobTitleName”, url : {$addToSet : “$url”}}}])
$firstRetrieves the first document from the source documents based on the grouping. Typically, this makes sense only in conjunction with a previously applied “$sort”-stage.db.empDetails.aggregate([{$group : {_id : “$jobTitleName”, first_url : {$first : “$url”}}}])
$lastGets the last document from the source documents according to the grouping. Typically this makes only sense together with some previously applied “$sort”-stage.db.empDetails.aggregate([{$group : {_id : “$jobTitleName”, last_url : {$last : “$url”}}}])

Pipeline Concept

Shell pipeline in UNIX commands refers to the ability to execute an operation on some input and use the output as the input for the next command, and so on. MongoDB supports the same concept in its aggregation framework. There are a number of possible stages, each of which takes a set of documents as input and produces a set of documents as output (or the final resulting JSON document at the end of the pipeline). This can then be applied to the next stage, and so on.Following are the possible stages in aggregation framework −

  1. $project − Used to select some specific fields from a collection.
  2. $match − This is a filtering operation and thus this can reduce the amount of documents that are given as input to the next stage.
  3. $skip − With this, it is possible to skip forward in the list of documents for a given amount of documents.
  4. $limit − This limits the amount of documents to look at, by the given number starting from the current positions.
  5. $group − This does the actual aggregation as discussed above.
  6. $sort − Sorts the documents.
  7. $unwind − This is used to unwind document that are using arrays. When using an array, the data is kind of pre-joined and this operation will be undone with this to have individual documents again. Thus with this stage we will increase the amount of documents for the next stage. 

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *