If you happen to be interested in the technologies that enable advanced business analytics, like I am, the last year has been an interesting one. A lot is happening, on all levels of the tech stack from raw infrastructure to cloud platforms and to functional applications.
As Hadoop has really caught on and is now a building block for even conservative corporations, several of its weaknesses are also beginning to be tackled. From my point of view, the most severe has been the terrible processing latencies of the batch- and filesystem-oriented MapReduce approach, rather than solutions designed on top of streaming data. That's now being addressed by several projects. Storm provides a framework for dealing with incoming data, Impala makes querying stored data more processing-efficient, and finally, Parquet is coming together to make the storage itself more space- and I/O efficient. With these in place, Hadoop will move from its original strength in unstructured data processing to a compelling solution for dealing with massive amounts of mostly-structured events.
Those technologies are a bear to integrate and, in their normal mode, require investment in hardware. If you'd prefer to get a more flexible start to building a solution, Amazon Web Services has introduced a lot of interesting stuff, too. Not only have the prices for compute and storage dropped, they now offer I/O capacities comparable to dedicated, FusionIO-equipped database servers, very cost efficient long-term raw data storage (Glacier), and a compelling data warehouse/analytics database in the shape of Redshift. The latter is a very interesting addition to Amazon's already-existing database-as-a-service offerings (SimpleDB, DynamoDB and RDS), and, as far as I've noticed, gives it a unique capability other cloud infrastructure providers are today unable to match - although Google's BigQuery comes close.
The next piece in the puzzle must be analytical applications delivered as a service. It's clear that the modern analytics pipeline is powered by event data - whether it's web clickstreams (Google Analytics, Omniture, KISSMetrics or otherwise), mobile applications (such as Flurry, MixPanel, Kontagent) or internal business data, it's significantly simpler to produce a stream of user, business and service events from the operational stack than it is to try to retrofit business metrics on top of an operational database. The 90's style OLTP-to-OLAP Extract-Transform-Load approach must die!
However, the services I mentioned above, while excellent in their own niches, can not produce a 360-degree view across the entire business. If they deliver dashboards, customer insight is impossible. Even if they're able to report on customers, they don't integrate to support systems. They leave holes in the offering that businesses have to plug with ad-hoc tools. While it's understandable, as they're built on technologies that force nasty compromises, those holes are still unacceptable for a demanding digital business of today. And as the world increasingly turns more digital, what's demanding today is going to be run-of-the-mill tomorrow.
Fortunately, the infrastructure is now available. I'm excited to see the solutions that will arrive to make use of the new capabilities.