Earlier today, I mentioned querymongo.com in a tweet that fired off a chat covering various database-related topics each worth a blog post of their own, some of which I've written about here before:

One response in particular stood out as something I want to cover in a bit more detail that will fit in a Tweet:

While it's fair to say I don't think MongoDB's query syntax is pretty in the best of circumstances, I do agree that at times, given the right kind of other tools your dev team is used to (such as, when you're developing in a JavaScript-heavy HTML5 + Node.js environment) and the application's context is one where objects are only semi-structured, it can be a very good fit as the online database solution. However, as I was alluding to in the original tweet and expounded on in its follow-ups, it's an absolute nightmare to try to use MongoDB as the source for any kind of reporting, and most applications need to provide reporting at some point. When you get there, you will have three choices:

  1. Drive yourselves crazy by trying to report from MongoDB, using Mongo's own query tools.
  2. Push off reporting to a 3rd party service (which can be a very, very good idea, but difficult to retrofit to contain all of your original data, too).
  3. Replicate the structured part of your database to another DBMS where you can do SQL or something very SQL-like, including reasonably accessible aggregations and joins.

The third option will unfortunately come with the cost of having to maintain two systems and making sure that all data and changes are replicated. If you do decide to go that route, please do yourself a favor and pick a system designed for reporting, instead of an OLTP system that can simply do reporting, when pushed to do so. Yes, that latter category includes both Postgres and MySQL - both quite capable as OLTP systems, but you already decided to do that using MongoDB, didn't you?

Most reporting tasks are much better managed using a columnar, analytics-oriented database engine optimized for aggregations. Many have spawned in the last half a decade or so: Vertica, Greenplum, Infobright, ParAccel, and so on. It used to be that choosing to use one might be either complicated or expensive (though I'm on record saying Infobright's open source version is quite usable), but since last week's Amazon conference and its announcements, there's a new player on the field: Amazon Redshift, apparently built on top of ParAccel, priced at $1000/TB/year. Though I've yet to have a chance to participate in its beta program and put it through its paces, I think it's pretty safe to say it's a tectonic shift on the reporting databases market as big or bigger as the original Elastic Compute Cloud was to hosting solutions. Frankly, you'd be crazy not to use it.

Now, reporting is reporting, and many analytical questions businesses need to solve today really can't be expressed with any sort of database query language. My own start-up, Metrify.io is working on a few of those problems, providing cloud-based predictive tools to decide how to serve customers before there's hard data what kind of customers they are. We back this with a wide array of in-memory and on-disk tools which I hope to describe in more detail at a later stage. From a practical "what should you do" point of view though -- unless you're also working on an analytics solution, leave those questions to someone who's focused on that, turn to SaaS services and spend your own time on your business instead.