A year ago, I criticized the under-development Falcon storage engine in MySQL 6.0 of failing to meet the demand of large-scale deployments. Falcon has now reached a beta phase and is included in the MySQL 6.0 alpha versions, most recent release of which is 6.0.4 this February. We're thinking of making an early test of Falcon in place of MyISAM/InnoDB for Habbo to see what to expect later on, so I reviewed the documentation again, and thought to look at my concerns from a year ago.

Falcon now supports multiple tablespaces per database, although the corresponding manual page still begins with the unfortunately misleading sentence of "all data ... is stored within a single file", and goes on to correct itself in the second paragraph. To ease volume management, it also allows ALTER TABLE to migrate tables from one tablespace to another, but these are not online operations, so to transfer very large databases, an online backup + offline restore followed by binlog apply and switchover would produce a lower downtime. One tablespace does not (yet?) support multiple data files, so there's still some concern over performant and reliable storage of extremely large single tables.

Not directly related to (only) Falcon, MySQL 6.0 will have BACKUP/RESTORE DATABASE functionality, that at least on paper tries to minimize downtime. However, with the current beta version, online backups are not supported for Falcon. It'll be interesting to see whether this will eliminate need for hairy and failure-prone custom backup solutions in the future.

A read of the threading and commit model of Falcon still leaves me wondering whether really-high-end storage systems and >8 core systems are going to be fully utilized under strenuous I/O. While each execution thread schedules I/O to the serial commit log, only one thread manages the writes of committed data to the data files in order to free up space in the log. As I mentioned before, I/O systems exist that simply can not be fully utilized by just one CPU doing the random access work - witness also degraded performance of same I/O systems with Linux 2.6 CFQ elevator's single sorted I/O queue and the 25% higher throughput with a random-order queue achieved by no-op elevator, ie, optimizing too much to avoid random access can hurt you with large tertiary caches or SSD storage. Still, my impression this time is much better than on the first read of the alpha docs a year ago.

That leaves cache management as a big drawback compared to Oracle, DB2 and the like. The Falcon engine in one MySQL instance has only one index/record cache across all tablespaces, meaning that one bad query causing a table scan will still be able to wipe out all cache/buffers from the entire system, bringing performance to halt for all users.

Foreign keys are not yet supported, either, so a full replacement for InnoDB can not be tested at this point. Persplexingly, these are not mentioned in the GA roadmap either.

On positive note, I'm glad to see Falcon will collect performance metrics to the information schema for flexible access.