Earlier today, I mentioned querymongo.com in a tweet that fired off a chat covering various database-related topics each worth a blog post of their own, some of which I've written about here before:
— Osma Ahvenlampi (@osma) December 12, 2012
One response in particular stood out as something I want to cover in a bit more detail that will fit in a Tweet:
@osma I went on record in a Tietokone-column claiming that MongoDB + Postgres is a good combo :-) Best of both worlds.
— Kenneth Falck (@kennu) December 12, 2012
- Drive yourselves crazy by trying to report from MongoDB, using Mongo's own query tools.
- Push off reporting to a 3rd party service (which can be a very, very good idea, but difficult to retrofit to contain all of your original data, too).
- Replicate the structured part of your database to another DBMS where you can do SQL or something very SQL-like, including reasonably accessible aggregations and joins.
The third option will unfortunately come with the cost of having to maintain two systems and making sure that all data and changes are replicated. If you do decide to go that route, please do yourself a favor and pick a system designed for reporting, instead of an OLTP system that can simply do reporting, when pushed to do so. Yes, that latter category includes both Postgres and MySQL - both quite capable as OLTP systems, but you already decided to do that using MongoDB, didn't you?
Most reporting tasks are much better managed using a columnar, analytics-oriented database engine optimized for aggregations. Many have spawned in the last half a decade or so: Vertica, Greenplum, Infobright, ParAccel, and so on. It used to be that choosing to use one might be either complicated or expensive (though I'm on record saying Infobright's open source version is quite usable), but since last week's Amazon conference and its announcements, there's a new player on the field: Amazon Redshift, apparently built on top of ParAccel, priced at $1000/TB/year. Though I've yet to have a chance to participate in its beta program and put it through its paces, I think it's pretty safe to say it's a tectonic shift on the reporting databases market as big or bigger as the original Elastic Compute Cloud was to hosting solutions. Frankly, you'd be crazy not to use it.
Now, reporting is reporting, and many analytical questions businesses need to solve today really can't be expressed with any sort of database query language. My own start-up, Metrify.io is working on a few of those problems, providing cloud-based predictive tools to decide how to serve customers before there's hard data what kind of customers they are. We back this with a wide array of in-memory and on-disk tools which I hope to describe in more detail at a later stage. From a practical "what should you do" point of view though -- unless you're also working on an analytics solution, leave those questions to someone who's focused on that, turn to SaaS services and spend your own time on your business instead.