Getting Started with MongoDB – Part 2

Getting Started with MongoDB – Part 2

Tutorial Details
  • Topic: MongoDB
  • Version: 1.6.5
  • Difficulty: Intermediate
  • Estimated Completion Time: 45 mins

Ready to continue learning about MongoDB, one of the coolest technologies for web developers? In this second part of the series, we move on from the basics, on to advanced queries – with conditional operators – and MapReduce.


Stepping Beyond the Basics

Earlier, we covered the basics of and how to get started with mongoDB, one of the absolute best of breed of the NoSQL implementations. We looked at how to install it, create a basic database and then perform the basic operations on it:

  • search
  • update
  • delete

In addition to this, we started to look at how to start interacting with mongoDB in a more powerful way, through the use of selectors. Selectors give us the ability to have much more fine grained control and to dig pretty deeply in to find the data that we really want.

Now that’s all well and good to get started with, but when you want to write a real application, you need to go a lot further. Well, this is still a getting started series after all, but I want to get you excited about the possibilities of working in a document-oriented way. I want to enthuse you to take this great technology and make it your own and use it as powerfully as you can to make fantastic applications.

Today, we’re going to expand on queries from last time and learn two key aspects of mongoDB:

  • Advanced Queries
  • MapReduce

Advanced Queries

Previously we looked at basic queries and were introduced to selectors. Now we’re going to get into more advanced queries, by building on the previous work in two key ways:

  • Conditional Operators
  • Regular Expressions

Each of these successively provide us with more fine-grained control over the queries we can write and, consequently, the information that we can extract from our mongoDB databases.

Conditional Operators

Conditional operators are, as the name implies, operators to collection queries that refine the conditions that the query must match when extracting data from the database. There are a number of them, but today I’m going to focus on 9 key ones. These are:

  • $lt – value must be less than the conditional
  • $gt – value must be greater than the conditional
  • $lte – value must be less than or equal to the conditional
  • $gte – value must be greater than or equal to the conditional
  • $in – value must be in a set of conditionals
  • $nin – value must NOT be in a set of conditionals
  • $not – value must be equal to a conditional

Let’s look at each one in turn. Open up your terminal and get ready to use the original database from the first part in this series (pre-modifications). To make this tutorial easier, we’re going to make a slight alteration to the database. We’re going to give each document in our collection an age attribute. To do that, run the following modification query:

			db.nettuts.update({"_id" : ObjectId("4ef224be0fec2806da6e9b27")}, {"$set" : {"age" : 18 }});
			db.nettuts.update({"_id" : ObjectId("4ef224bf0fec2806da6e9b28")}, {"$set" : {"age" : 45 }});
			db.nettuts.update({"_id" : ObjectId("4ef224bf0fec2806da6e9b29")}, {"$set" : {"age" : 65 }});
			db.nettuts.update({"_id" : ObjectId("4ef224bf0fec2806da6e9b2a")}, {"$set" : {"age" : 43 }});
			db.nettuts.update({"_id" : ObjectId("4ef224bf0fec2806da6e9b2b")}, {"$set" : {"age" : 22 }});
			db.nettuts.update({"_id" : ObjectId("4ef224bf0fec2806da6e9b2c")}, {"$set" : {"age" : 45 }});
			db.nettuts.update({"_id" : ObjectId("4ef224bf0fec2806da6e9b2d")}, {"$set" : {"age" : 33 }});
		

All being well, you can run a ‘find all’ and you’ll have the following output:

		db.nettuts.find();                                                                          
		{ "_id" : ObjectId("4ef224be0fec2806da6e9b27"), "age" : 18, "dob" : "21/04/1978", "first" : "matthew", "gender" : "m", "hair_colour" : "brown", "last" : "setter", "nationality" : "australian", "occupation" : "developer" }
		{ "_id" : ObjectId("4ef224bf0fec2806da6e9b28"), "age" : 45, "dob" : "26/03/1940", "first" : "james", "gender" : "m", "hair_colour" : "brown", "last" : "caan", "nationality" : "american", "occupation" : "actor" }
		{ "_id" : ObjectId("4ef224bf0fec2806da6e9b29"), "age" : 65, "dob" : "03/06/1925", "first" : "arnold", "gender" : "m", "hair_colour" : "brown", "last" : "schwarzenegger", "nationality" : "american", "occupation" : "actor" }
		{ "_id" : ObjectId("4ef224bf0fec2806da6e9b2a"), "age" : 43, "dob" : "21/04/1978", "first" : "tony", "gender" : "m", "hair_colour" : "brown", "last" : "curtis", "nationality" : "american", "occupation" : "developer" }
		{ "_id" : ObjectId("4ef224bf0fec2806da6e9b2b"), "age" : 22, "dob" : "22/11/1958", "first" : "jamie lee", "gender" : "f", "hair_colour" : "brown", "last" : "curtis", "nationality" : "american", "occupation" : "actor" }
		{ "_id" : ObjectId("4ef224bf0fec2806da6e9b2c"), "age" : 45, "dob" : "14/03/1933", "first" : "michael", "gender" : "m", "hair_colour" : "brown", "last" : "caine", "nationality" : "english", "occupation" : "actor" }
		{ "_id" : ObjectId("4ef224bf0fec2806da6e9b2d"), "age" : 33, "dob" : "09/12/1934", "first" : "judi", "gender" : "f", "hair_colour" : "white", "last" : "dench", "nationality" : "english", "occupation" : "actress" }
		

$lt/$lte

Now let’s find all the actors who are less than 40. To do that, run the following query:

		db.nettuts.find( { "age" : { "$lt" : 40 } } ); 
		

After running that query, you’ll see the following output:

		{ "_id" : ObjectId("4ef224be0fec2806da6e9b27"), "age" : 18, "dob" : "21/04/1978", "first" : "matthew", "gender" : "m", "hair_colour" : "brown", "last" : "setter", "nationality" : "australian", "occupation" : "developer" }
		{ "_id" : ObjectId("4ef224bf0fec2806da6e9b2b"), "age" : 22, "dob" : "22/11/1958", "first" : "jamie lee", "gender" : "f", "hair_colour" : "brown", "last" : "curtis", "nationality" : "american", "occupation" : "actor" }
		{ "_id" : ObjectId("4ef224bf0fec2806da6e9b2d"), "age" : 33, "dob" : "09/12/1934", "first" : "judi", "gender" : "f", "hair_colour" : "white", "last" : "dench", "nationality" : "english", "occupation" : "actress" }
		

What about the ones who are less than 40 inclusive? Run the following query to return that result:

		db.nettuts.find( { "age" : { "$lte" : 40 } } ); 
		

This returns the following list:

		{ "_id" : ObjectId("4ef224be0fec2806da6e9b27"), "age" : 18, "dob" : "21/04/1978", "first" : "matthew", "gender" : "m", "hair_colour" : "brown", "last" : "setter", "nationality" : "australian", "occupation" : "developer" }
		{ "_id" : ObjectId("4ef224bf0fec2806da6e9b2b"), "age" : 22, "dob" : "22/11/1958", "first" : "jamie lee", "gender" : "f", "hair_colour" : "brown", "last" : "curtis", "nationality" : "american", "occupation" : "actor" }
		{ "_id" : ObjectId("4ef224bf0fec2806da6e9b2d"), "age" : 33, "dob" : "09/12/1934", "first" : "judi", "gender" : "f", "hair_colour" : "white", "last" : "dench", "nationality" : "english", "occupation" : "actress" }
		

$gt/$gte

Now let’s find all the actors who are older than 47. Run the following query to find that list:

			db.nettuts.find( { 'age' : { '$gt' : 47 } } );
		

You’ll then get the following output:

		{ "_id" : ObjectId("4ef224bf0fec2806da6e9b29"), "age" : 65, "dob" : "03/06/1925", "first" : "arnold", "gender" : "m", "hair_colour" : "brown", "last" : "schwarzenegger", "nationality" : "american", "occupation" : "actor" }
		

What about inclusive of 40?

			db.nettuts.find( { 'age' : { '$gte' : 47 } } );
		

As there’s only one person over 47, the data returned doesn’t change.

$in/$nin

What about finding information based on a list of criteria? These first ones have been ok, but arguably, quite trivial. Let’s now look to see which of the people we have are either actors or developers. With the following query, we’ll find that out (to make it a bit easier to read, we’ve limited the keys that are returned to just first and last names):

			db.nettuts.find( { 'occupation' : { '$in' : [ "actor", "developer" ] } }, { "first" : 1, "last" : 1 } );
		

This query, yields the following output:

			{ "_id" : ObjectId("4ef224be0fec2806da6e9b27"), "first" : "matthew", "last" : "setter" }
			{ "_id" : ObjectId("4ef224bf0fec2806da6e9b28"), "first" : "james", "last" : "caan" }
			{ "_id" : ObjectId("4ef224bf0fec2806da6e9b29"), "first" : "arnold", "last" : "schwarzenegger" }
			{ "_id" : ObjectId("4ef224bf0fec2806da6e9b2a"), "first" : "tony", "last" : "curtis" }
			{ "_id" : ObjectId("4ef224bf0fec2806da6e9b2b"), "first" : "jamie lee", "last" : "curtis" }
			{ "_id" : ObjectId("4ef224bf0fec2806da6e9b2c"), "first" : "michael", "last" : "caine" }
		

You can see that we can get the inverse of this by using $ninjust as simply.

Let’s make this a bit more fun and combine some of the operators. Let’s say that we want to look for all the people, who are either male or developers, they’re less than 40 years of age.

Now that’s a bit of a mouthful, but with the operators that we’ve used so far – quite readily achievable. Let’s work through it and you’ll see. Have a look at the query below:

			db.nettuts.find( { $or : [ { "gender" : "m", "occupation" : "developer" } ], "age" : { "$gt" : 40 } }, { "first" : 1, "last" : 1, "occupation" : 1, "dob" : 1 } );
		

You can see that we’ve stipulated that the either the gender can be male or the occupation can be a developer in the $or condition and then added an and condition of the age being greater than 4.

For that, we get the following results:

			{ "_id" : ObjectId("4ef22e522893ba6797bf8cb6"), "first" : "matthew", "last" : "setter", "dob" : "21/04/1978", "occupation" : "developer" }
			{ "_id" : ObjectId("4ef22e522893ba6797bf8cb9"), "first" : "tony", "last" : "curtis", "dob" : "21/04/1978", "occupation" : "developer" }
		

Regular Expressions

Now I’m sure that you’re not going to be satisfied with just this. I did promise you some more complexity and advanced functionality. So let’s get in to using some regular expressions. Let’s say that we want to find the users that have a first name starting with ‘ma’ or ‘to’ and who’s last names begin with ‘se’ or ‘de’. How would we do that?

Have a look at the following query using a regular expression:

			db.nettuts.find( { "first" : /(ma|to)*/i, "last" : /(se|de)/i  } );
		

Given that, the results will be:

		{ "_id" : ObjectId("4ef22e522893ba6797bf8cb6"), "first" : "matthew", "last" : "setter", "dob" : "21/04/1978", "gender" : "m", "hair_colour" : "brown", "occupation" : "developer", "nationality" : "australian" }
		{ "_id" : ObjectId("4ef22e532893ba6797bf8cbc"), "first" : "judi", "last" : "dench", "dob" : "09/12/1934", "gender" : "f", "hair_colour" : "white", "occupation" : "actress", "nationality" : "english" }
		

Let’s look at that query a bit more closely. Firstly, we’re performing a regex on the first name.

		"first" : /(ma|to)*/i
		

//i indicates that we’re performing a case-insensitive regex.

(ma|to)* indicates that the start of the first name string must be either ‘ma’ or ‘to’.

If you’re not familiar, the * at the end, will match anything after that. So when you put it together, we match first names that have either ‘ma’ or ‘to’ at the beginning of them. In the regex for the last name, you can see that we’ve done the same thing, but for the last name.

Not quite sure? Let’s try another one. What about combining it with one of the conditional operators. Let’s say we want to find all the people with the first name of james or jamie who are american female actors. How would we do that? Well, let’s see how we’d do it below:

			db.nettuts.find( { "first" : /(jam?e*)*/i, "gender" : "f", "occupation" : "actor", "nationality" : "american"  } );
		

The regex above will match combinations such as: james, jamie, jamee etc. The question mark will match one character, whether a-z, A-Z or 0-9. Then, as before, the * matches anything else that comes after the ‘e’. From there on, we’re using the conditional operators from before to further limit the results that come back. It should be noted that as we’re using the case-insensitive operator, I, the queries won’t use an index. But for the purposes of this example, it’s fine.

The output of the query above is:

		{ "_id" : ObjectId("4ef22e522893ba6797bf8cba"), "first" : "jamie lee", "last" : "curtis", "dob" : "22/11/1958", "gender" : "f", "hair_colour" : "brown", "occupation" : "actor", "nationality" : "american" }
		

MapReduce

MapReduce is the big daddy of data analysis. In case you’ve not heard of it, MapReduce is a process where the aggregation of data can be split up and farmed out across a cluster of computers to reduce the time that it takes to determine an aggregate result on a set of data.

It’s made up of two parts: Map and Reduce. Map creates the jobs that can then be farmed out to the worker nodes to run the Reduce component. Reduce then computes the answer for that chunk of work that was farmed out to it and returns the result that can be combined with the other chunks to form the final answer.

If you want a more specific description, here’s what Wikipedia has to say about it:

MapReduce is a framework for processing highly distributable problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster (if all nodes use the same hardware) or a grid (if the nodes use different hardware). Computational processing can occur on data stored either in a file system (unstructured) or in a database (structured).

“Map” step: The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.

“Reduce” step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.

Example MapReduce

Let’s look at a simple example. We’re going to analyse our simple dataset and find the total count of all the females in the group. Admittedly, this is a very simplistic example, but it will lay the foundation for understanding, practically, how MapReduce works.

The Map Function

Here, we’re going to create a map function that aggregates the information in our dataset by the gender of the person and emits a count of 1 for every one of them.

			var map = function() { 
			    emit( { gender: this.gender }, { count: 1 } ); 
			}
		

This will return output similar to the following:

			{ 'f' : 1 }
		

The Reduce Function

Our reduce function is going to take the output from the map function and use it to keep a running total of the count for each gender. Have a look at the reduce function below.

			var reduce = function(key, values) {
			    var result = { count : 0 };
    
			    values.forEach(function(value){
			        result.count += value.count;
			    })
    
			    return result;
			}
		

Running the MapReduce

Now, we put them together by calling the mapReduce function in our current database. We pass in the map and reduce variables we created previously by calling our map and reduce functions and specify the name of the collection that the result will be stored in; in this case ‘gender’. Just to reiterate, the result of calling the mapReduce function is a collection in our current database; that we can iterate over just like we would any other collection. Have a look at the code below:

			var res = db.nettuts.mapReduce( map, reduce, { out : 'gender' } );
		

Displaying the output

When the Map-Reduce is completed, we can access it just like a normal collection, by running the findfunction on it as we do below.

			db.gender.find();                                                                             
			{ "_id" : { "gender" : "f" }, "value" : { "count" : 2 } }
			{ "_id" : { "gender" : "m" }, "value" : { "count" : 5 } }
		

Here, we now have a running total per gender; 2 for females and 5 for males – which correlates with our dataset. But what if we wanted to filter by females in the group. Well, not much to it. We only need to make use of the query clause to allow us to do so. Lucky for us, it will look familiar. Have a look at the query below.

			var res = db.nettuts.mapReduce( map, reduce, { out : 'gender', query : { "gender" : "f" } } );
		

Now, when we display the output, it will look like that below:

			db.gender.find();                                                                             
			{ "_id" : { "gender" : "f" }, "value" : { "count" : 2 } }
		

There are a number of other parameters that you can pass to the mapReduce function to further customise the output.

  • sort – sort the output returned
  • limit – limit the number of results returned
  • out – the name of the collection to store the results in
  • finalise – specify a function to run after the reduction process is complete
  • scope – specify variables that can be used in the map, reduce and finalise function scope
  • jsMode – Avoids an intermediate step (between map and reduce) of converting back to JSON format
  • verbose – track statistics about the execution process

Winding Up

This has been a whirlwind coverage of some of the more complex topics of mongoDB. But I hope that it’s given you even more of a taste of what’s possible through using this great tool.

We’ve looked at the conditional operators: $lt, $gt, $lte, $gte, $in, $nin, $not and run through an introduction to MapReduce. I hope that you’ve gotten a lot out of this and will learn more about the great tool that is mongoDB.

Note: Want to add some source code? Type <pre><code> before it and </code></pre> after it. Find out more
  • http://www.udgwebdev.com Caio Ribeiro Pereira

    Nice post man!!!

    MongoDB is very easy to work!

    • http://www.maltblue.com Matthew Setter

      Definitely is. So flexible and simple to use.

  • erminio ottone

    The only thing i hate about nettuts series? 2 months between each part :(

    • http://www.maltblue.com/ Matthew Setter

      Erminio,

      yeah, that’s partly my fault. I should have followed up sooner.

  • http://bit.ly/cLZXGi Julian

    Great post man, when I finally get around to learning MongoDB this will be my first source!

    • http://www.maltblue.com/ Matthew Setter

      Julian,

      thanks kindly. mongoDB’s definitely flexible, especially when coming from a more traditional database background.

  • http://sebduggan.com Seb Duggan

    Useful little tutorial; only problem is with your updates: you rather assume that the record IDs will be the same on my computer as they are on yours – which they aren’t…

    • http://www.maltblue.com/ Matthew Setter

      Seb,

      yeah, I appreciate that oversight now that I’ve looked at it again.

    • http://www.renownedmedia.com Thomas Hunter

      The original author needs to update his updates so that instead of using the randomly generated IDs his machine passed out, it instead uses the first and last names of the previous tutorial.

      Or, even better, the author should take the time to copy and paste the inserts from the previous article and update them accordingly, which would save the reader from having to hunt down the previous article.

      The Regex is completely wrong as well (as several others have pointed out).

      Overall this is a really poor quality tutorial, I’m wondering if Envato reads them before allowing them to be published.

  • http://saidtazi.com/ kosaidpo

    good post , i cant wait to see how to handle relations in a very soon time if possible , thanks for sharing

    • http://zackperdue.com Zack

      There are no relations. Just embedded documents.

  • Dhruv Kumar

    Where’s my video man? (checkout http://www.youtube.com/watch?v=j4RmlKDScj0 for reference)

  • JT

    mongo is great once you understand its limitations and how the documents structure works.

    For instance, each document object is allocated a certain amount of space on the disk with a little extra room for small changes. When a doc is added to and grows in size beyond that natural ‘wiggle room’ the entire doc is moved to another part of the disk, which means a slower write execution. (more space can be pre-allocated to avoid it but may result in redundancy).

    Whereas updating a value with a similar sized value simply updates that part of the disk and doesn’t need the entire doc to be moved, which is the super quick way of doing things. Also, a document object can grow in size but doesn’t shrink which may result in redundancy depending on how the document structure is used.

    It’s something to bear in mind when designing a database for mongo as its not that clear in the documentation.

    Also worth noting that writes are fire-and-forget so the application layer needs to be a little more robust if you want to make sure a write completed correctly.

    • http://www.maltblue.com/ Matthew Setter

      Hey JT,

      thanks for mentioning this as it’s very well worth knowing and bearing in mind in a schema design.

  • http://www.jsxtech.com Jaspal Singh

    Great article.
    keep posting more…

  • Ridwan

    i was wondering how to make a validation for example username, since it generates object ID always unique not like RDBMS it will automatically block new record to be inserted if it’s same value? btw great tutorial keep posting

  • chantivlad

    “Let’s say that we want to look for all the people, who are either male or developers, they’re less than 40 years of age.”

    the code says

    “age” : { “$gt” : 40 }

    which is not really intuitive as one reads naturally “age greater than 40″.

    Plus you say the line after that: “then added an and condition of the age being greater than 4.”
    There is a typo (4 instead of 40), and it is not clear wether it is age > 40 or age < 40 really.

    I think you should replace "age" with real value, otherwise it is confusing that someone born at "dob" : "21/04/1978" is "age" : 18…

    Otherwise nice introduction.

  • chantivlad

    “have a first name starting with ‘ma’ or ‘to’ and who’s last names begin with ‘se’ or ‘de’”

    { “_id” : ObjectId(“4ef22e532893ba6797bf8cbc”), “first” : “judi”, “last” : “dench”, “dob” : “09/12/1934″, “gender” : “f”, “hair_colour” : “white”, “occupation” : “actress”, “nationality” : “english” }

    judi does not start with ‘ma’ or ‘to’, does it?

    • http://ryanjodonnell.com Ryan

      Yea I was wondering the same thing…

    • http://www.maltblue.com/ Matthew Setter

      chantivlad/Ryan,

      sorry about that oversight. I’ll get the code correct shortly – and thanks for pointing it out.

    • http://9gag.com hpaul

      It’s correct because this command:
      db.nettuts.find( { “first” : /(ma|to)*/i, “last” : /(se|de)/i } );

      Select all items that contain in “first” (ma|to) and “last” (se|de) and if you see in the second result “last” contain “de”.

      Do not prejudge.

      • Zlati Pehlivanov

        well you forgot to put * in the regex for the last, that way it should select only last names equal to se or de.

  • apa

    The regular expression /(ma|to)*/i is wrong. If you want to match names starting with ‘ma’ or ‘to’ you need /^(ma|to).*/i

    I think your regex matches all strings since it matches an empty string and all strings contain an empty string. I didn’t try this out with MongoDB, though.

  • Francisc

    I think there’s a problem with the IDs and data from the previous article (unless that was updated after I went though it). The IDs are different, so the update to add age fails and also, the data isn’t the same (names, occupations etc).

    In other news, this is inaccurate:
    “(ma|to)* indicates that the start of the first name string must be either ‘ma’ or ‘to’. ”

    That matches ‘ma’ or ‘to’ anywhere in the string, zero or more times. To match the start you need /^(ma|to)/i.

    • Francisc

      Wow… “If you’re not familiar, the * at the end, will match anything after that. So when you put it together, we match first names that have either ‘ma’ or ‘to’ at the beginning of them. In the regex for the last name, you can see that we’ve done the same thing, but for the last name.”

      That’s not true… you probably meant “.”.
      “*” matches the previous thing in this case “ma” or “to” 0 or more times…

      • Francisc

        Your “james” reg exp is also wrong…

        “The question mark will match one character, whether a-z, A-Z or 0-9″.
        No, the question mark means “0 or 1 time”.

        Your query (/(jam?e*)*/i) means:
        find: ja(optional-M)(e-zero-or-more-times)-everything zero-or-more-times.

        What you need is this: /jam(\w?)e/i

  • cherrycore
  • http://www.jibo.ro Anunturi

    Nice post, gonna start a Q&A site soon using Mongo as a db backend. This should come in handy.

  • kshirod

    Hey thanks for such a good tutorial about Mongo.I liked both parts.I hope you will soon release the third part.Waiting for that

  • http://picturds.com/blog Markus Stenqvist

    I wrote a simple mongodb function to fix the age problem we all have going through this tutorial. It just randomizes an age for every object there is.

    Just copy and paste:

    function adding_ages() {
    db.nettuts.find().forEach( function(obj) {
    db.nettuts.update({‘_id’ : obj._id}, {‘$set’: {‘age’: Math.floor(Math.random()*40)}});
    });
    }

    Then just call it by writing:

    adding_ages();

    I hope someone will have use of this.

    // Markus

  • http://web.com prtat
  • whatthef

    As somebody coming in with no background on mongo db, I was completely confused with the map/reduce part. Seems more background information was needed. What the f is emit? The appears to be a whole scripting language with functions that I don’t recall learning about previously. Perhaps I missed a section.

    • Ajedi32

      I’m not sure either. It looks like JavaScript…

  • Ajedi32

    This is a great article about MongoDB Matthew, but almost everything you said about Regular Expressions is wrong. (Sorry, but it’s true.) The `*` character matches 0 or more of whatever character is before it, and the question mark matches 0 or 1 of whatever character is before it. Also, /(se|de)/ will match `se` or `de` anywhere in the string, not just at the beginning of it.

    So `db.nettuts.find( { “first” : /(ma|to)*/i, “last” : /(se|de)/i } );` matches any record that has ANYTHING in the first name, and `se` or `de` anywhere in the last name.

    I believe what you meant in that case was: `db.nettuts.find( { “first” : /^(ma|to)/i, “last” : /^(se|de)/i } );`

    Here’s a link to a pretty basic tutorial on regular expressions: http://www.regular-expressions.info/quickstart.html