Falcon database engine in MySQL 6.0 alpha
By Osma on Saturday 29 March 2008, 15:04 - Permalink
A year ago, I criticized the under-development Falcon storage engine in MySQL 6.0 of failing to meet the demand of large-scale deployments. Falcon has now reached a beta phase and is included in the MySQL 6.0 alpha versions, most recent release of which is 6.0.4 this February. We're thinking of making an early test of Falcon in place of MyISAM/InnoDB for Habbo to see what to expect later on, so I reviewed the documentation again, and thought to look at my concerns from a year ago.
Falcon now supports multiple tablespaces per database, although the corresponding manual page still begins with the unfortunately misleading sentence of "all data ... is stored within a single file", and goes on to correct itself in the second paragraph. To ease volume management, it also allows ALTER TABLE to migrate tables from one tablespace to another, but these are not online operations, so to transfer very large databases, an online backup + offline restore followed by binlog apply and switchover would produce a lower downtime. One tablespace does not (yet?) support multiple data files, so there's still some concern over performant and reliable storage of extremely large single tables.
Not directly related to (only) Falcon, MySQL 6.0 will have BACKUP/RESTORE DATABASE functionality, that at least on paper tries to minimize downtime. However, with the current beta version, online backups are not supported for Falcon. It'll be interesting to see whether this will eliminate need for hairy and failure-prone custom backup solutions in the future.
A read of the threading and commit model of Falcon still leaves me wondering whether really-high-end storage systems and >8 core systems are going to be fully utilized under strenuous I/O. While each execution thread schedules I/O to the serial commit log, only one thread manages the writes of committed data to the data files in order to free up space in the log. As I mentioned before, I/O systems exist that simply can not be fully utilized by just one CPU doing the random access work - witness also degraded performance of same I/O systems with Linux 2.6 CFQ elevator's single sorted I/O queue and the 25% higher throughput with a random-order queue achieved by no-op elevator, ie, optimizing too much to avoid random access can hurt you with large tertiary caches or SSD storage. Still, my impression this time is much better than on the first read of the alpha docs a year ago.
That leaves cache management as a big drawback compared to Oracle, DB2 and the like. The Falcon engine in one MySQL instance has only one index/record cache across all tablespaces, meaning that one bad query causing a table scan will still be able to wipe out all cache/buffers from the entire system, bringing performance to halt for all users.
Foreign keys are not yet supported, either, so a full replacement for InnoDB can not be tested at this point. Persplexingly, these are not mentioned in the GA roadmap either.
On positive note, I'm glad to see Falcon will collect performance metrics to the information schema for flexible access.
Comments
Unfortunately, you have a few of your facts wrong here.
The current Falcon release, version 6.0.4, has a pool of IO threads that can be configured. Thus you can spread the IO work over multiple CPUs.
Version 6.0.5 has row-level backup integrated with Falcon. This release is in production now. In the mean time, you can get a copy at http://forge.mysql.com/wiki/Falcon_...
There is a single cache for all tablespaces, but this cache is by no means blocking. A hash table exists to store a pointer to each page and that short code patch is the only cache-wide serialization point. It is a very short code path, and even this will get broken up in our next release (6.0.6). So there is no need to worry about a single Falcon cache. It is actually a benefit, allowing the work to be spread more evenly across the system resources.
Falcon scales much better than InnoDB with multiple cores and with increasing connections. It still has its bottlenecks, which are being identified and fixed. But its performance has already eclipsed InnoDB on 8-way systems where the database is fully cached. And the IO performance is improving.
Kevin Lewis
MySQL Falcon Team Lead
I have to object to "Falcon scales much better than InnoDB." Publish the benchmarks and their setups, please. The only benchmarks I've seen were not setup correctly. (Remaining anonymous because I'm not sure whether these benchmarks are supposed to be public).
Kevin, thanks for taking the time to comment. As said, this was again purely based on a read of the documentation. Glad to hear you have implemented backups; you should update the docs as that is a major item.
Same goes for threads, it's still contradicted in http://dev.mysql.com/doc/refman/6.0... and leaves the reader confused.
As for the "single cache" item; I think you misread my meaning. Since a DBA can not assign separate cache/buffer memory regions to particular tablespaces, accidents like "select * from history_log" wiping out everything else in the cache and causing excessive cache misses for transactional events can still happen. Yes, a single cache in theory "spreads more evenly", but in practice, this sort of situations can easily negate that.
I haven't seen those Falcon scaling benchmarks, would love to review them. However, as long as foreign keys remain unsupported, that's a pretty meaningless comparison for practical considerations..