Fishpool

To content | To menu | To search

Tuesday 12 May 2009

Confusing Sun communication about MySQL 5.4

Just received an email newsletter from Sun titled "MySQL 5.4 Preview Release" which states:

Sun Microsystems recently released MySQL 5.4, delivering performance and scalability improvements enabling the InnoDB storage engine to scale up to 16-way x86 servers and 64-way CMT servers.

MySQL 5.4 also includes new subquery optimizations and JOIN improvements, resulting in 90% better response times for certain queries.

Apparently, the confusion about the contents of the release I wrote about earlier continue to reign inside Sun as well. MySQL 5.4 has not been released by any reasonable meaning of the word, since there's "only" a preview available at this time. Compare this to Windows 7: that's already a Release Candidate, but it has not been released. Also, the preview release available does not include new subquery optimizations nor JOIN improvements. Having planned such improvements doesn't count.

As I wrote earlier, the best of the rather bad excuses for the release labeling offered to me was that Sun wanted to avoid confusion by not releasing many versions at once. I think that got replaced (and then some) by plenty of extra confusion about when and what was released, instead. Sorry, no good. Try again, 'kthxbye.

Monday 4 May 2009

What does Oracle mean for Java?

Over the past two weeks I've been mostly focused on MySQL, but the big-ticket item in the Sun/Oracle deal is not databases, it's Java. However, it's also the domain which is far less clear to predict. It was a big deal when Sun decided to open source Java, but the fact of the matter is that the first fully open source release isn't out yet, and Sun has been keeping the testing and certification kit off-limits for open source communities. This means it would still be far too easy for OpenJDK to be killed off.

I've been keeping clear of Oracle for several years, and can't even begin to guess what their position on this is. Oracle has been a pretty active contributor to Linux in particular for several years, and I'm sure their open source strategy and how it works together with their business is pretty well established within at least the engineering parts of the company. At the same time, their notoriously aggressive market tactics make sure that everyone's wary of their next move. Java is a huge part of Oracle's business, and after they purchased BEA, I wouldn't be surprised if Oracle wasn't already the biggest Java company (in terms of revenue) ahead of both Sun and IBM. After completing the Sun acquisition, that'll be guaranteed.

That's a big balance shift for the overall Java community. Now, Oracle is a smart company. My worry is they might emphasize short-term tactical market advantage (owning all of Java, JRockit, Glassfish and WebLogic to compete against other middleware and business applications) over long-term strategic benefit of a unified platform competing with .NET and the host of open source platforms from PHP and Ruby to Python. With such a wide field, following up on, and improving on the open source platform process would be the right thing to do - and it would help me :)

Thursday 30 April 2009

The difference between conversion and retention

Picked up a piece of analysis today from my newsfeed regarding Twitter audience. Nielsen has posted information about Twitter's month-to-month retention (40%) and compared that to Facebook's and MySpace's. Pete Cashmore over at Mashable promptly misread the basic information and came to an entirely wrong conclusion about the stats, titling his post about it as "60% quit Twitter in the first month". A simple misunderstanding of basic audience analysis like this is the crucial difference between explosively growing traffic and a failure. That's a fail for you, Pete.

What's wrong? Well, retention is a separate matter from conversion. 40% conversion from a trial registration to being a continuing active user to the second month would not be a bad conversion rate. It's not stratospherically great, I've seen better, but I wouldn't be terribly unhappy about such a figure. However, Nielsen didn't say anything at all about first-to-second month conversion. This is what they DID say: "Twitter’s audience retention rate, or the percentage of a given month’s users who come back the following month, is currently about 40 percent."

That's pretty plain English when you take the time to read it. Month to month, regardless of visitor lifetime, not first to second month. On this metric, 40% retention is not good at all, and will definitely be a limiting factor to Twitter's traffic and audience size over time, just the Nielsen article points out (and shows the math for). For any given retention rate, there just is a certain maximum audience reach beyond which any new traffic can't overcome the leaving base, since new traffic is not an inexhaustible supply.

And since today is a busy day, that concludes the free startup advice. Take the time to understand the difference between these metrics, you'll thank yourself for it later.

Tuesday 28 April 2009

The MySQL community outlook

While I can not consider myself a member of MySQL's community of developers, I've been watching those developments the same way I follow the development of Linux and many of the Java and Apache projects our own services depend on. It was great to meet many of the core members of the development community and get some insight into their thoughts about the future.

Baron Schwartz called in his Percona Performance Conference keynote on Thursday for a new, active MySQL community to take the driver's seat in the development of the database, not just in the incremental improvements way of bug fixing and performance improvement, but also by setting a vision for the next generation MySQL. It's a call to action greatly needed, and an important one despite the active existence of the Drizzle project. This is because while Drizzle already has a vision for the future, it's a radical diversion for the MySQL userbase and one which will not necessarily have smooth upgrade path. Many of the same MySQL users feeling most of the pain of MySQL's current limitations are also those who will not be able to easily upgrade to a radically different architecture due to the amount of data and dependencies in their existing infrastructure.

It's a gap which needs a careful approach of incremental changes to the MySQL base functionality to help users bridge over to a new, brighter future. These changes do not need to be slow. Rapid incremental changes are likely to be easier to digest with a clear upgrade and downgrade path from iteration to iteration leaving the organizations with biggest infrastructures to consider a way to set their own pace through the transition, rather than being forced to take one huge leap and risk a crash to the concrete wall of unexpected incompatibility.

A few such pieces of incremental community improvements I learned a great deal of during the week were the performance and scalability improvements by Google and Percona and their MySQL 5.4 equivalents, the Xtrabackup utility not only as an alternative, but improvement on the Innobackup tool which has significant limitations to its use in large-scale deployments, and the Tungsten Replicator providing useful cross-database replication and rapid failover features helping upgrades and transitions to new database installations while minimizing downtime and impact to users. I'm also curious about the storage engine development by Primebase - I don't think there's ultimately a lot of room for multiple transactional storage engines, but as a competitive research topic, it's certainly good to see alternatives to InnoDB.

[Be sure to check out my earlier posts of the conference learnings as well!]

Monday 27 April 2009

Database innovation on MySQL

If MySQL's core server development and release process has been somewhat of a frustration to the userbase over the past few years, clearly another part of the ecosystem has thrived in ways which brought exciting fruit to the Expo part of this year's conference. MySQL has become a hub of innovation in both transactional and analytics databases in ways which have turned many of my concerns to enthusiasm.

I've already discussed the technologies for data analytics on MySQL, in particular Infobright's storage engine technology. This year I took the opportunity to learn a bit more about their appliance-based competitor Kickfire as well, and it certainly looks like a solid product. I still don't completely understand what the "SQL chip" in their appliance does, but certainly the combination of a special-purpose columnar storage, high-speed memory interface and high-performance indexing should form basis for a great analytics system. How it compares in practice to Infobright's software-only approach, time will tell. I'd be interested in real-world experiences, so if you have some to share, please get in touch. Finally, I missed the Calpont info myself, but once it is released, I'll try to get the time to try it out.

I'm even more excited about the new solutions on the transactional side of things. I've certainly been among the people frustrated by MySQL/InnoDB's scaling issues on modern hardware, and glad to see that the optimization work done by Google, Innobase and Percona is being accepted to the "mainline" MySQL Enterprise Server. However, what I did not expect to see were the solutions shown by Virident and Schooner for accelerated, Flash-based storage appliances. It's interesting how both of these companies have chosen to apply their platforms to accelerate both InnoDB and Memcached, and I'm looking forward to the chance to spend more time with both solutions. While both are Flash-based approaches, they seem to have taken very different architectural choices in the way they're exposing the memory to the software layer, and I'm curious to see the impact those choices have on both IO and storage capacity scaling. In any event, these are unique technologies unlike what I've seen for other platforms at this time. I need to learn how they plan to work with the community and Sun/Oracle in keeping the solutions functionally compatible with standard MySQL server.

The ecosystem doesn't end at the appliances, though. On the software side of things, I was pleasantly surprised by the state of Primebase's PBXT storage engine as well as Continuent's new Tungsten Replicator. While both are still early in their development path, they seem to hold a lot of promise for improving the performance of MySQL's built-in functionality in InnoDB as well as in the replication subsystem. Robert Hodges's demo of Tungsten's set-up and management also looked like it will greatly simplify replication administration, which is a big deal for anyone who has to manage 20+ replicated database systems. What's more, if Robert and his team crack the multi-threaded replication problem, and major scalability concern is lifted.

[Be sure to check out my earlier posts of the conference learnings as well!]

Sunday 26 April 2009

MySQL 2009-2010 roadmap

The development model for MySQL Enterprise took a big step forward with the new community process Karen Padir announced in her Tuesday keynote. This is great for both the open source server as well as enterprise customers, because the closer the tie between the community and the development path, the better the quality and faster the progress towards new functionality. I'm not entirely sure everyone at Sun still completely understands why a working community process is a benefit for the enterprise customer base, but I'm happy steps are made in the right direction, and it seems to me that Karen Padir is going to be a good leader for the product.

A big improvement, for sure, and still there's more to improve here. To borrow the words of Baron Schwartz, MySQL currently "has" a community, while it would really be in everyone's benefit if instead MySQL would "be" a community. I would suggest that the goal should be not monthly "community" releases from Sun, but a completely out-in-the-open development process with the community members being on the driving seat regarding patch acceptance, quality management and releases, much like the Fedora process works. Sure, there's a role for corporate sponsorship and project management, but it's a distinct difference of responsibility. The Drizzle project is another good example of how this can work. An important point to realize here is that there is a difference between the community, an active partner in the process of making the software better, and the unpaid userbase. The latter is an acquisition and conversion vehicle for the former, but they're separate entities.

The announcement of the 5.4 server was at the same time an encouraging as well as confusing example of the changes. I would like to be enthuastic about it, but we've seen MySQL (if not Sun) announce pre-announce releases that didn't appear before, and it's a long way to the promised release time. I asked two questions from many, many MySQL staff members during the week: why is it that 5.4 was announced now, but is slated to be released GA only in December when it clearly demonstrates massive scalability improvements already, and why is it that the feature list for the final 5.4 release is much longer than what's already completed? I did not get a really coherent answer from anyone. Best I could decipher, there is somewhere a faceless "marketing" which decided that a) there should only be one release announced and b) 40% demonstrated improvement is not good enough when it's not the only improvement that can be made. I also learned that it's not unlikely that much of the work which has gone to 5.4.0-beta would be backported to the 5.1 branch and released in a 5.1 point release before the actual 5.4 release, because in fact they can be considered bugfixes.

I consider myself not an entirely unexperienced in the decision processes for release management, and know intimately the clarity hindsight provides to well-intentioned choices made with best available information. I know there are many areas to consider, and every decision made is a compromise. I still can't bring myself to completely understand what exactly led to this particular approach. Lets recap:

  • Improvements already made are announced and made available in beta test form, but beta does not contain everything planned for the release
  • Final release is intentionally delayed by 7 months adding significant project risk to it, despite having no previously committed release schedule
  • Former release version is planned to by improved by making significant performance-altering changes in a point release in order to offset the delay
  • Such a release adds risk to maintenance roadmap and steals away upgrade motivation from the upcoming version

How this plan serves either Sun, the community, the free userbase or the enterprise customers is a mystery to me. It would certainly seem far simpler and clearer to take an aggressive quality assurance and release testing position with the intent to push 5.4 out as a rock-solid replacement upgrade to 5.1 as soon as possible, and only then continue with further updates as a 5.5 release. This would definitely be welcomed by everyone but the class of enterprise customers who like to hear about future versions two years in advance - but keep in mind that such conservative enterprises are not MySQL's primary customer base anyway, and if MySQL is to make inroads there, rapidly improving the quality and performance of the product in the meantime would still be a sensible step.

There is the argument that if I want to get those performance features now, I can use Percona/XtraDB or MySQL 5.1 plus the InnoDB Plugin. While technically that route does work, and clearly is worth pursuing as a user, it does have its drawbacks in terms of requiring multiple sources and it's hard to see how it supports MySQL/Sun's commercial interests, the latter surely having been a consideration in the 5.4 release plans.

Thus far in the argument I have ignored one new component - Oracle. That's because to my understanding the process I've discussed did not consider the acquisition, which was unknown to most people before Monday. Clearly this changes a few points. It's not necessarily in the interests of Oracle for MySQL to continue making inroads to enterprise customers, though if someone's going to be cannibalizing Oracle's database sales, it might as well be Oracle. InnoDB Plugin will also be a product from the same company as MySQL Server in the near future - in fact, in a future likely to be fact before the final GA release of MySQL 5.4. What is the role of a delayed 5.4 release in this equation, then?

Recap of MySQL Conference 2009

This was an interesting week for sure. Of course, we all know it started with a bit of a shock news, but that's not nearly the most interesting bit about the conference. I'm posting a series of cleaned-up notes and opinions about what I saw there as I finish them. Will also try to link to further information where I've seen good notes. Please leave more links in the comments if you have any!

Thursday 23 April 2009

Three domains of data

My MySQL Conference presentation on Tuesday discussed my practical findings on how Infobright's technology works in developing a MySQL-based data warehouse. I also touched on a more high-level question of how to select a technology for a different kinds of data-related problem areas, and this article expands on that discussion.

Continue reading...

Wednesday 22 April 2009

Mining for insight - presentation materials

Completed my MySQL Conference presentation 45 minutes ago. Seemed to go over ok, got some followup questions. Trouble is, I got hit by amazing jetlag half an hour before the session, and almost fell asleep myself during the presentation. Fortunately, survived that anyway, and as far as I could see, was the only one having problems staying awake. Below is an embedded version of the slides, which should also appear on the conference proceedings site later. Now for a beer at the expo. Will blog with more description of the stuff later (update: see this follow-up article).

Read this doc on Scribd: Mining for insight

Tuesday 21 April 2009

Interesting start to MySQL Conf

So, waking up this morning to prepare for the first day of the MySQL Conference, the first news I pick up is that it's now Oracle Conference instead. Been speaking with a few people through the day and confusion as to how this is going to impact the company, development community, users or customers reigns. Personally, I'm more apprehensive than excited about it at this point - but frankly, I've spent more time thinking of how to apply MapReduce to large-scale ETL processes than about the acquisition today.

However, I'll keep digesting this for a while, and hopefully after a couple of days of discussing it with people here I can form a better opinion. I don't know Oracle that well - it never seemed like a very easily approachable company to me, and what connection I've had with the technology its felt a bit baroque and legacy, but I haven't even looked at it in a few years. It is interesting though - Oracle's acquisition of InnoDB a couple of years back was certainly one reason why there's today a number of development projects for other transactional storage engines for MySQL. Sun has been an active corporate sponsor to a number of such projects, and it doesn't seem very likely that Oracle would want to continue that. Dunno.

- page 3 of 22 -