Fishpool

To content | To menu | To search

Tag - speak

Entries feed - Comments feed

Wednesday 8 April 2009

Using the Infobright Community Edition for event log storage

Apart from the primary "here's how we ended up using Infobright for data warehousing and how is that working out" topic I'm going to discuss in my MySQL Conf presentation I'll touch on another application, the use of Infobright's open-source Community Edition server for collection and storage of event logs. This is a system we've implemented in the past couple of months to solve a number of data management problems that were gradually becoming problematic for our infrastructure.

We've traditionally stored structured event log data in databases for ease of management. Since Habbo uses MySQL for most everything else, putting the log tables in the same databases was pretty natural. However, there are significant problems to this approach:

  • MyISAM tables suffer from concurrency issues and crash-recovery is very slow due to table consistency check
  • InnoDB tables suffer from I/O bottlenecks and crash-recovery is very slow due to rollback segment processing
  • Both scale badly to hundreds of millions of rows (especially if indexing is required), and mixing them is not a recommended practice
  • Storage becomes an issue over time, especially as indexes can easily require many times as much disk than the data, and an event log is going to have a LOT of rows
  • Partitioning has only recently become available, and before that, managing "archive" tables needed manual effort
  • Perhaps worst of all (as it's very hard to measure), if any of this is happening on the primary DB servers, it's competing for buffer pool memory with transactional tables, thus slowing down everything due to cache misses

Over the years, we've tackled these issues in many ways. However, with our initial experience of scaling an Infobright installation for data warehousing needs, a pretty simple solution became apparent, and we rapidly implemented an asynchronous, buffered mechanism to stream data into an ICE database. We're early with this implementation, but it has turned out to be a satisfactory high-performance solution. Even better, it's a very simple thing to implement, even in a clustered service spanning many hosts, as long as log tables don't need to be guaranteed 100% complete or up-to-date to the last second. Here's a description of the simple solution; extending that to the complex solution providing those guarantees is left as an exercise to the reader.

Rather than running single INSERTs to a log table or writing lines to a text file log, each server buffers a small set of events, eg for the past second in a memory buffer. These are then sent over a message bus or lightweight RPC call to a log server, which writes them to a log file that is closed and switched to a new file after every megarow or every few minutes, whichever is smaller. A second process running on this log server wakes up periodically and loads each of these files (minus the last one, which is still being written to) into the database with LOAD DATA INFILE.

This has multiple general benefits:

  • Buffered messaging requires much less time on the "client" servers compared to managing database connections and executing thousands of small transactions
  • The asynchronous processing ensures database contention can not produce random delays to the normal service operation
  • Batch loading of text files is implemented by every DB server, so there's little in this implementation that is proprietary or dependent on any particular DB solution
Using the Infobright ICE as the backend database provides a number of additional specific benefits:
  • Excellent data load performance
  • No index management, yet capability to run queries on the data without first extracting it to another DB
  • No degradation of performance as deployment size grows, as would happen even to a MyISAM table should it have any indexes
  • Compressed storage, so less spinning iron required
  • Columnar datapack organization should not require table partitioning even over long periods
This works very well for structured events. For unstructured data, a different solution is required, which I will discuss at some later date.

Update: Mark Callaghan asked in the comments for some quantified details. We have not spent the time to produce repeatable benchmarks, so all I can offer on that front is anecdotal data - it's very conclusive for us, given it's addressing our real concerns, but less so for others. That said, ICE does not support inserts, only batch loads, so the solution had to be engineered to use that, which added some complexity, but brought orders of magnitude more performance. A simple benchmark run showed that the end-to-end performance for this exceeded 100,000 events per second when running all parts of the client-logserver-database chain on a single desktop machine.

Query performance depends on the queries made. Summary data is 2-3 orders of magnitude faster to access, the bigger the dataset, the bigger the performance benefit - but expecting that for single row accesses would disappoint badly. Storage compression varies wildly depending on the data in question -- we've seen up to 15:1 compression on some real-world data sets, but others (such as storing email addresses in a varchar column) actually expand on storage. This is why I think of this as a solution for structured, quantified event logs, not for general unstructured log file storage.

Tuesday 26 February 2008

A look back at GDC, and forward to ION

Much to my regret, I had to miss GDC San Francisco this year, but I've been following with interest some of the session transcripts and got feedback from my colleagues who just returned. One thing I noted in particular was Raph Koster's comments on the iteration speed of web developers (measured in days  or weeks typically) vs that of game studios (where the difference might be six months for a casual Wii title and four years for a triple-A PS3 title -- of course iterating inside the development team, but with little consumer feedback). It seems Raph has taken a lot of this onboard in the development of Metaplace, as seen in their pre-release "postmortem" session.

Of course, I noted this because this pretty much reflects how we (Sulake) have been modeling not just our development process but our business management methods driving that development. That is, make the iteration cycle ever faster, learn to do big changes in very, very small chunks, incorporating metrics- and testing-based learning all through the cycle.

Also, I'm going to be talking about this very topic this May in ION 08 in Seattle. Looks like I missed a lot of interesting discussion relevant to that session, so I hope I won't be repeating too much of what was already stated in GDC.