Posted on

Using MongoDB requires thinking MongoDB, Meteor Live-Queries

mongopain-01

Ok let’s say you want to build a webapp.

If so, you might want to choose a database – and if you are about to make that choice, you might choose MongoDB. The reason why you chose MongoDB might be one (or more) of the following:

  • it’s popular, so it’s got a great community + documentation to help you out
  • it’s document-based, so you can grab and store your data-model “in one piece”
  • it’s supporting the way you’re naturally thinking about your data-model
  • it’s API can be best described as “easy to use”
  • you’re using Meteor
  • you’re using it for doing higher-level queries, like querying the volume of 3D geometries stored in the db

And those reasons might have been stronger than the reason for why to not choose MongoDB:

  • it’s got no join operations

What might happen to you (as it happened to me) is that the way you are structuring your data-model turns out to totally require joint operations, without joint operations and joined queries your project is likely to fail and/or all you ever wanted are joint operations. If so, the assumption that you naturally think “MongoDB” just because you are likely to think document-based just turned out to be wrong. Totally.

Want many-to-many?

mongopain-01

Mommy, where are my joins?

Because of the fact, that many-to-many relationships in databases can be handled with SQL joins, and MongoDB obviously does not support SQL joins (and probably never will), there are many articles out there with statements like “you never should use MongoDB” or even “MongoDB sucks” (too many to link them here). But in my eyes, they’ve just been using MongoDB all wrong. I chose to adress the issue by showing how to use it right. Still, there will be cases where you better go SQL, and there will be cases where you better go NoSQL, this article just wants to make the NoSQL path easier.

Let me first set up a little example of a mailbox-implementation whithin some kind of social webapp. There will be:

  • one collection containing one document per registered user on that specific webapp. One of those documents could look like this

  •  one collection containing one document per message, that has been sent from one user to another, one of those documents could look like this, assuming that you want to link the users to the message, i.e. for dealing with updated usernames, avatars or online-states (hint: this approach is wrong and leads directly to mongo-hell):

In the snipped above, the sender (from) and the receivers (to) of the message are linked by their id. Whenever someone wants to read the message, we would do one query to the “messages”-collection and several queries to the “users”-collection to generate some html like this:

date: Friday, January 10 2014 at 14:40
from: Dan Jackson
to: Banana Joe, Iron Man
subject: meeting tomorrow

just wanted to remind you guys about our meeting tomorrow at 2 pm.

And whenever we want to perform a full text search within the “messages”-collection, we will have to perform a full text search in all linked collections (i.e. the “user”-collection), too, and then perform a search matching those secondary linked IDs with our actual query in the “messages”-collection. This is when you start noticing that simply linking documents by their ID sucks (or “Why you should never use MongoDB”). And the reason why it sucks is because the philosophy of MongoDB is, to store and retrieve data-models “as-they-are”. Now, linked IDs are totally not “as-they-are”, as long as your friends do not show up at your place in form of 17 character long alphanumeric strings, but with names and faces (respectively “avatars”).

think of your data “as-it-is”, or “as-it-matters”, not “as-it’s-linked”

So this is where we have to start to “think MongoDB” and make sure, our data is available “as-it-matters”. This is actually way simpler than most people think, and even simpler if you use a framework like Meteor that offers so called Live-Queries.

First, we will have change the data model of our message to “as-it-matters-to-us”. If you look at this closely, you might say “hey, but now the IDs do not matter to us anymore, why do we keep them in the model?” – well, they will matter later when we sync this piece of copied data to its original.

Now, all that’s left to do is taking care of synchronisation between collections, since we copied data from the “users”-collection to the “messages”-collection. This is not as complex as it seems, and it becomes even easier using either Meteor’s Live-Queries or a similar concept of another framework. It just takes you this short code snippet to permanently sync those two:

So syncUserMessages is our Live-Query that will run forever (or until stop() is called on it). Whenever a user changes one of the fields in the query (“profile.lastName”, “profile.firstName” or “profile.avatar”) those fields will be updated in all messages the user has ever sent or received. This simple method is more efficient than it looks – for MongoDB it’s easy to update and store those documents, and at least for Meteor Live-Queries it’s easy to efficiently keep track of the sync, since it’s using Oplog tailing since version 0.7.0.1. This method will work great for all cases, where the both following things do not come together:

  • linked data change happens very, very often (usernames, profile pictures)
  • the number of links within or to one document is very, very large

When both come together, it’s most ceraintly because you’re dealing with many-to-many relationships in your database, and for all those other cases, especially many-to-many, it will still work great if you schedule the syncing operation to an manageable intervall, say once every day. Think about the above example: It does not really matter if a changed username reflects in the message database immediately.

No Links != NoSQL

And still, if this does not suit some of your needs, you are still free to link IDs and perform a second database query. There are some cases to do so, but far more not to. A typical scenario where you would still create ID links could be an online-indicator, that tells the user weather the sender of a message is currently online or offline. You would not want to update hundrets of messages whenever a user goes online – therefore you’ll just link the ID of the user, perform another query to find out the online/offline state, and there you go.

Need Joins = need SQL

And finally, if there are still cases where you really cannot live without SQL joins, then, maybe, you should go SQL. There are other difficulties waiting for you, like dealing with performance when non-stop normalizing / denormalizing data, or real-time-data.

I hope this method saves you some nerves when working with MongoDB.

Leave a Reply

Your email address will not be published. Required fields are marked *