Fishpool

To content | To menu | To search

Thursday 27 August 2009

Reflections on Nokia Maemo

Earlier today Nokia announced their first handset based on what is likely to be their mobile operating system of the future - the Nokia N900 Maemo. I didn't think I would bother to pay attention, but somehow, I ended up doing so anyway, and this post is a result of that time spent thinking about it. I like quite a few things about it, but can't avoid being deeply bothered by other aspects. I hope by writing this I can make some small contribution to its future.

Why do I care? After the frustrations and disappointments with Nokia devices in the past years, I've tried not to. However, they're impossible to ignore in Finland, I have family reasons to hope this road leads to something good, and it's an attempt to make an open platform -- and I care about open platforms. Why do I feel qualified to comment? Well, because this is not really about devices, it's about software. And software is what I've always done, and managing software organizations is what I think about daily.

There's lot to like about the N900. I haven't seen, let alone played with one, but as far as the specs go, it's a pretty nice set of hardware. Same performance as the iPhone 3GS (which also makes it faster than any Android device announced), 3D acceleration, lots of storage (and a memory card slot), and, as a welcome change from many other Nokia devices, completely standard connectors (3.5mm audio, micro-USB tethering and battery charger). On the hardware side, the only thing not to like about it is the lack of a finger-usable, multi-touch display. This device, like all the other Nokia devices before it, require a stylus or at least long fingernails. It makes up for that by being really high resolution.

It's also based on an open source, Linux-based operating system Nokia has been developing for several years with community participation, Maemo. This makes it more attractive to me on a personal level than iPhone (which is way too closely guarded and controlled by Apple), Palm WebOS (open, but little track record), or even Android (open, but built out of pieces which have far less common with normal Linux than Maemo). It should be fairly clear that all four mentioned are way ahead of things like Symbian S60, which clearly needs to be taken behind the shed and put out of its misery, not matter what Nokia's representatives say about it official capacity.

On a more professional level, the inclusion of Flash 9.4 in the platform is a big deal. I'm anxious to get hold of one and see how much work it is to make Habbo work on it. This could be the first handset capable of technically running it (enough performance, enough resolution, good enough software), though obviously tuning our service to a mobile device would still need work on UI and other pieces.

However, like I wrote above, there's also plenty that bothers me. First of all, unlike most people probably realize, this is actually the 4th Maemo device in about as many years that Nokia releases. First was the Nokia 770 Internet Tablet, essentially an early adopter test device. Then came N800 -- running an updated OS which required applications to be ported to it, but which never was officially released for the 770 (putting the early adopter developers in a rather awkward position). Less than a year later N810 added a physical keyboard and an OS upgrade (which fortunately could be installed, with some difficulty, on the N800). That was quite a long time ago, though.

In the meantime, Maemo has been completely reinvented. The original UI toolkit has been switched to QT, which Nokia bought in the meantime, and all of the (rather limited quantity) applications require significant rework to be compatible with the OS release on the N900. The public reasoning for this compatibility break has been pretty weak -- "to ensure compatibility with S60", which also is moving on to QT framework. Why is this weak? Well, because the transition over on the S60 side also requires all of the (somewhat more numerous) applications developed for that platform to be significantly reworked. In other words, Nokia broke compatibility on both its old smartphone platform and the new platform at the same time, and offered little transitionary compatibility layers to either side. Not for the first time, either. S60 applications have been broken between upgrades several times before, too.

This track record is highly worrying. Despite their years of practice and ambitions to have a lively third party mobile applications market, Nokia has clearly not grasped the importance of a stable platform to the developers they mean to attract. This lack of understanding of one of the most basic requirements is enough to counter pretty much everything I wrote about Maemo versus its closest competitions a few chapters earlier.

Contrast the above to iPhone OS 2.0 to 3.0 transition. Sure, a few things did change. However, developers were given months of notice ahead of time, and the changes, apart from added functionality, were all pretty minor. Of course, Apple has a long history of making major upgrades while retaining forwards compatibility, with the Mac OS 68k to PowerPC, then to OS X, then to Intel CPU transitions.

It's also taken a LONG time for this device to be announced. I don't know, but I get the feeling it's something like a year late. The break in launch schedule between N810 and N900, the amount of changes in the Maemo platform, and the design of the device compared to for instance the N97 all scream "last year" to me. Besides, everyone knew this was coming ages ago. In the time between the launches of N810 and N900, Apple has managed to update the iPhone twice. This lack of predictability in the release cycle doesn't bode well for the next device in the line.

There is nothing more important for progress in software development than cycle time. The only cost-effective, productive way of making software today is to get feedback on it often, and the longer it stays unreleased, the more the feedback is late when it comes. This seems to be another area where Nokia has not been able to shake off their "we make hardware" mentality. Unline hardware, software can be updated with no extra cost. That's an advantage nearly everyone else has learned to make use of, and Nokia, if they truly desire to become a software and services powerhouse, has to finally take to heart.

N900 is not an "iPhone killer". I don't think it's meant to be. When its development started, it's unlikely the iPhone had even been announced. However, it's the best chance for Nokia to ever develop a device better than an iPhone. I hope they will - the world needs competition, and I would like to see Nokia be part in that. However, at this rate they will never catch up - Apple will have released two more major updates before the next Maemo device unless Nokia gets their act together.

I'm still hoping.

Thursday 16 July 2009

Excuse the downtime

Mea culpa, a small domain to fix email redirection update seems to have gone wrong and taken the blog offline. I didn't notice since I've been mostly offline myself, so thanks to the readers who notified me. If you can read this, the problem has been resolved.

Wednesday 27 May 2009

What we're looking for in a data integration tool

As our data warehousing process grows and the workflows get more complex, we've revisited the question of what tools to use in this process. Out of curiosity, I had a look at basing such a process on Hadoop/Hive for scalability reasons, but the lack of mature tools and the sacrifices on efficiency that would entail meant we're better off using something else as long as a distributed processing platform is the only thing that can get the job done. I'm also curious about the transition to continuous integration, a model I noticed showing up a couple of years ago and now getting some air under its wings as CEP, IBM's Infosphere Streams, and other similar approaches. Still, I think I'll continue to rely on something else for a while and see how things shake out. Continuous integration clearly is the future, but there are many ways to get there.

So, we had a look at what's going on in the Open Source data integration field. It seems the leaders in that field are Pentaho with Kettle/Pentaho Data Integration, and Talend with Open Studio and Talend Integration Suite. Both seem pretty even in terms of features. Both companies are a bit difficult to approach as a potential customer, so I figured I should also try what would come up from the OSS approach of just posting my thoughts on the Interweb ;)

Besides the technical pilot implementations we've made to compare basic workflow of the various tools, below is a sample of the kind of questions we're considering when evaluating the suitability of the tools.

Product roadmap, release schedule and size of the development team

  • How often and of what scope of changes should we expect and prepare ourselves for platform upgrades?
  • Past track record on keeping to a regular updates schedule

Data lineage and dependency, Impact analysis

  • How to find out which tables are being used to for deriving DWH dimensions and facts?

Logging, auditing, monitoring on row and job level

  • How to monitor and archive workflows on a row level (amount of rows being inserted/updated/deleted)?
  • How to maintain, access and query a job execution history (start time/end time/return code)?

Version control

  • How to track and restore changes in jobs?

Multi-user environment

  • How can several developers work together?

Change Data Capture

  • How to assist incremental loads?

Data profiling

  • How can data source be examined?

Job recovery

  • How to recover from possible failures in jobs (such as lost database connection)?

Deploy jobs

  • How to move jobs from one repository to another (development to testing to production)?

Sunday 24 May 2009

Hello, MySQL 6.0, err, something

I'm conflicted about the latest twist of the MySQL release saga, ie the announcement of the 6.0.11 alpha version and the accompanying note that it's the last 6.0 release and will be replaced by the already discussed milestone model. From an engineering point of view, I think this is the right step. I'm not sure about that, because I can't really tell exactly what is the engineering model chosen: trunk-first, then backport, or fix-in-releases, then forward port. I also can't tell whether the milestone model is going to be timeboxed or feature-scoped. Personally, I would prefer to see the former of both alternatives.

From a customer point of view, I'm even more confused, though much less concerned. Okay, so 6.0 won't become the marketing version number of any MySQL Enterprise release? Doesn't matter. 5.4 needs to come out first anyway, preferably sooner with a concrete, well-tested feature set, than later with more planned-but-unfinished features stuffed in it. What the release after that is going to be called makes no difference to me, as long as it's also going to contain solid improvements and comes out on predictable schedule that doesn't force me to look for something drastically different in order to deal with scale.

That being said, it's still weird. So if the thought of 6.0 GA release is scrapped, why release anything and still call it 6.0? I guess it's just tying loose ends, but that's an engineering thing, and only the number of existing source branches with stuff to merge together matter, not the version number put to it...

Tuesday 12 May 2009

Confusing Sun communication about MySQL 5.4

Just received an email newsletter from Sun titled "MySQL 5.4 Preview Release" which states:

Sun Microsystems recently released MySQL 5.4, delivering performance and scalability improvements enabling the InnoDB storage engine to scale up to 16-way x86 servers and 64-way CMT servers.

MySQL 5.4 also includes new subquery optimizations and JOIN improvements, resulting in 90% better response times for certain queries.

Apparently, the confusion about the contents of the release I wrote about earlier continue to reign inside Sun as well. MySQL 5.4 has not been released by any reasonable meaning of the word, since there's "only" a preview available at this time. Compare this to Windows 7: that's already a Release Candidate, but it has not been released. Also, the preview release available does not include new subquery optimizations nor JOIN improvements. Having planned such improvements doesn't count.

As I wrote earlier, the best of the rather bad excuses for the release labeling offered to me was that Sun wanted to avoid confusion by not releasing many versions at once. I think that got replaced (and then some) by plenty of extra confusion about when and what was released, instead. Sorry, no good. Try again, 'kthxbye.

Monday 4 May 2009

What does Oracle mean for Java?

Over the past two weeks I've been mostly focused on MySQL, but the big-ticket item in the Sun/Oracle deal is not databases, it's Java. However, it's also the domain which is far less clear to predict. It was a big deal when Sun decided to open source Java, but the fact of the matter is that the first fully open source release isn't out yet, and Sun has been keeping the testing and certification kit off-limits for open source communities. This means it would still be far too easy for OpenJDK to be killed off.

I've been keeping clear of Oracle for several years, and can't even begin to guess what their position on this is. Oracle has been a pretty active contributor to Linux in particular for several years, and I'm sure their open source strategy and how it works together with their business is pretty well established within at least the engineering parts of the company. At the same time, their notoriously aggressive market tactics make sure that everyone's wary of their next move. Java is a huge part of Oracle's business, and after they purchased BEA, I wouldn't be surprised if Oracle wasn't already the biggest Java company (in terms of revenue) ahead of both Sun and IBM. After completing the Sun acquisition, that'll be guaranteed.

That's a big balance shift for the overall Java community. Now, Oracle is a smart company. My worry is they might emphasize short-term tactical market advantage (owning all of Java, JRockit, Glassfish and WebLogic to compete against other middleware and business applications) over long-term strategic benefit of a unified platform competing with .NET and the host of open source platforms from PHP and Ruby to Python. With such a wide field, following up on, and improving on the open source platform process would be the right thing to do - and it would help me :)

Thursday 30 April 2009

The difference between conversion and retention

Picked up a piece of analysis today from my newsfeed regarding Twitter audience. Nielsen has posted information about Twitter's month-to-month retention (40%) and compared that to Facebook's and MySpace's. Pete Cashmore over at Mashable promptly misread the basic information and came to an entirely wrong conclusion about the stats, titling his post about it as "60% quit Twitter in the first month". A simple misunderstanding of basic audience analysis like this is the crucial difference between explosively growing traffic and a failure. That's a fail for you, Pete.

What's wrong? Well, retention is a separate matter from conversion. 40% conversion from a trial registration to being a continuing active user to the second month would not be a bad conversion rate. It's not stratospherically great, I've seen better, but I wouldn't be terribly unhappy about such a figure. However, Nielsen didn't say anything at all about first-to-second month conversion. This is what they DID say: "Twitter’s audience retention rate, or the percentage of a given month’s users who come back the following month, is currently about 40 percent."

That's pretty plain English when you take the time to read it. Month to month, regardless of visitor lifetime, not first to second month. On this metric, 40% retention is not good at all, and will definitely be a limiting factor to Twitter's traffic and audience size over time, just the Nielsen article points out (and shows the math for). For any given retention rate, there just is a certain maximum audience reach beyond which any new traffic can't overcome the leaving base, since new traffic is not an inexhaustible supply.

And since today is a busy day, that concludes the free startup advice. Take the time to understand the difference between these metrics, you'll thank yourself for it later.

Tuesday 28 April 2009

The MySQL community outlook

While I can not consider myself a member of MySQL's community of developers, I've been watching those developments the same way I follow the development of Linux and many of the Java and Apache projects our own services depend on. It was great to meet many of the core members of the development community and get some insight into their thoughts about the future.

Baron Schwartz called in his Percona Performance Conference keynote on Thursday for a new, active MySQL community to take the driver's seat in the development of the database, not just in the incremental improvements way of bug fixing and performance improvement, but also by setting a vision for the next generation MySQL. It's a call to action greatly needed, and an important one despite the active existence of the Drizzle project. This is because while Drizzle already has a vision for the future, it's a radical diversion for the MySQL userbase and one which will not necessarily have smooth upgrade path. Many of the same MySQL users feeling most of the pain of MySQL's current limitations are also those who will not be able to easily upgrade to a radically different architecture due to the amount of data and dependencies in their existing infrastructure.

It's a gap which needs a careful approach of incremental changes to the MySQL base functionality to help users bridge over to a new, brighter future. These changes do not need to be slow. Rapid incremental changes are likely to be easier to digest with a clear upgrade and downgrade path from iteration to iteration leaving the organizations with biggest infrastructures to consider a way to set their own pace through the transition, rather than being forced to take one huge leap and risk a crash to the concrete wall of unexpected incompatibility.

A few such pieces of incremental community improvements I learned a great deal of during the week were the performance and scalability improvements by Google and Percona and their MySQL 5.4 equivalents, the Xtrabackup utility not only as an alternative, but improvement on the Innobackup tool which has significant limitations to its use in large-scale deployments, and the Tungsten Replicator providing useful cross-database replication and rapid failover features helping upgrades and transitions to new database installations while minimizing downtime and impact to users. I'm also curious about the storage engine development by Primebase - I don't think there's ultimately a lot of room for multiple transactional storage engines, but as a competitive research topic, it's certainly good to see alternatives to InnoDB.

[Be sure to check out my earlier posts of the conference learnings as well!]

Monday 27 April 2009

Database innovation on MySQL

If MySQL's core server development and release process has been somewhat of a frustration to the userbase over the past few years, clearly another part of the ecosystem has thrived in ways which brought exciting fruit to the Expo part of this year's conference. MySQL has become a hub of innovation in both transactional and analytics databases in ways which have turned many of my concerns to enthusiasm.

I've already discussed the technologies for data analytics on MySQL, in particular Infobright's storage engine technology. This year I took the opportunity to learn a bit more about their appliance-based competitor Kickfire as well, and it certainly looks like a solid product. I still don't completely understand what the "SQL chip" in their appliance does, but certainly the combination of a special-purpose columnar storage, high-speed memory interface and high-performance indexing should form basis for a great analytics system. How it compares in practice to Infobright's software-only approach, time will tell. I'd be interested in real-world experiences, so if you have some to share, please get in touch. Finally, I missed the Calpont info myself, but once it is released, I'll try to get the time to try it out.

I'm even more excited about the new solutions on the transactional side of things. I've certainly been among the people frustrated by MySQL/InnoDB's scaling issues on modern hardware, and glad to see that the optimization work done by Google, Innobase and Percona is being accepted to the "mainline" MySQL Enterprise Server. However, what I did not expect to see were the solutions shown by Virident and Schooner for accelerated, Flash-based storage appliances. It's interesting how both of these companies have chosen to apply their platforms to accelerate both InnoDB and Memcached, and I'm looking forward to the chance to spend more time with both solutions. While both are Flash-based approaches, they seem to have taken very different architectural choices in the way they're exposing the memory to the software layer, and I'm curious to see the impact those choices have on both IO and storage capacity scaling. In any event, these are unique technologies unlike what I've seen for other platforms at this time. I need to learn how they plan to work with the community and Sun/Oracle in keeping the solutions functionally compatible with standard MySQL server.

The ecosystem doesn't end at the appliances, though. On the software side of things, I was pleasantly surprised by the state of Primebase's PBXT storage engine as well as Continuent's new Tungsten Replicator. While both are still early in their development path, they seem to hold a lot of promise for improving the performance of MySQL's built-in functionality in InnoDB as well as in the replication subsystem. Robert Hodges's demo of Tungsten's set-up and management also looked like it will greatly simplify replication administration, which is a big deal for anyone who has to manage 20+ replicated database systems. What's more, if Robert and his team crack the multi-threaded replication problem, and major scalability concern is lifted.

[Be sure to check out my earlier posts of the conference learnings as well!]

Sunday 26 April 2009

MySQL 2009-2010 roadmap

The development model for MySQL Enterprise took a big step forward with the new community process Karen Padir announced in her Tuesday keynote. This is great for both the open source server as well as enterprise customers, because the closer the tie between the community and the development path, the better the quality and faster the progress towards new functionality. I'm not entirely sure everyone at Sun still completely understands why a working community process is a benefit for the enterprise customer base, but I'm happy steps are made in the right direction, and it seems to me that Karen Padir is going to be a good leader for the product.

A big improvement, for sure, and still there's more to improve here. To borrow the words of Baron Schwartz, MySQL currently "has" a community, while it would really be in everyone's benefit if instead MySQL would "be" a community. I would suggest that the goal should be not monthly "community" releases from Sun, but a completely out-in-the-open development process with the community members being on the driving seat regarding patch acceptance, quality management and releases, much like the Fedora process works. Sure, there's a role for corporate sponsorship and project management, but it's a distinct difference of responsibility. The Drizzle project is another good example of how this can work. An important point to realize here is that there is a difference between the community, an active partner in the process of making the software better, and the unpaid userbase. The latter is an acquisition and conversion vehicle for the former, but they're separate entities.

The announcement of the 5.4 server was at the same time an encouraging as well as confusing example of the changes. I would like to be enthuastic about it, but we've seen MySQL (if not Sun) announce pre-announce releases that didn't appear before, and it's a long way to the promised release time. I asked two questions from many, many MySQL staff members during the week: why is it that 5.4 was announced now, but is slated to be released GA only in December when it clearly demonstrates massive scalability improvements already, and why is it that the feature list for the final 5.4 release is much longer than what's already completed? I did not get a really coherent answer from anyone. Best I could decipher, there is somewhere a faceless "marketing" which decided that a) there should only be one release announced and b) 40% demonstrated improvement is not good enough when it's not the only improvement that can be made. I also learned that it's not unlikely that much of the work which has gone to 5.4.0-beta would be backported to the 5.1 branch and released in a 5.1 point release before the actual 5.4 release, because in fact they can be considered bugfixes.

I consider myself not an entirely unexperienced in the decision processes for release management, and know intimately the clarity hindsight provides to well-intentioned choices made with best available information. I know there are many areas to consider, and every decision made is a compromise. I still can't bring myself to completely understand what exactly led to this particular approach. Lets recap:

  • Improvements already made are announced and made available in beta test form, but beta does not contain everything planned for the release
  • Final release is intentionally delayed by 7 months adding significant project risk to it, despite having no previously committed release schedule
  • Former release version is planned to by improved by making significant performance-altering changes in a point release in order to offset the delay
  • Such a release adds risk to maintenance roadmap and steals away upgrade motivation from the upcoming version

How this plan serves either Sun, the community, the free userbase or the enterprise customers is a mystery to me. It would certainly seem far simpler and clearer to take an aggressive quality assurance and release testing position with the intent to push 5.4 out as a rock-solid replacement upgrade to 5.1 as soon as possible, and only then continue with further updates as a 5.5 release. This would definitely be welcomed by everyone but the class of enterprise customers who like to hear about future versions two years in advance - but keep in mind that such conservative enterprises are not MySQL's primary customer base anyway, and if MySQL is to make inroads there, rapidly improving the quality and performance of the product in the meantime would still be a sensible step.

There is the argument that if I want to get those performance features now, I can use Percona/XtraDB or MySQL 5.1 plus the InnoDB Plugin. While technically that route does work, and clearly is worth pursuing as a user, it does have its drawbacks in terms of requiring multiple sources and it's hard to see how it supports MySQL/Sun's commercial interests, the latter surely having been a consideration in the 5.4 release plans.

Thus far in the argument I have ignored one new component - Oracle. That's because to my understanding the process I've discussed did not consider the acquisition, which was unknown to most people before Monday. Clearly this changes a few points. It's not necessarily in the interests of Oracle for MySQL to continue making inroads to enterprise customers, though if someone's going to be cannibalizing Oracle's database sales, it might as well be Oracle. InnoDB Plugin will also be a product from the same company as MySQL Server in the near future - in fact, in a future likely to be fact before the final GA release of MySQL 5.4. What is the role of a delayed 5.4 release in this equation, then?

- page 4 of 23 -