Fishpool

To content | To menu | To search

Saturday 3 October 2009

Some scaling observations on Infobright

A couple of days ago, Baron Schwartz posted some simple load and select benchmarking of MyISAM, Infobright and MonetDB, which Vadim Tkachenko followed up with a more realistic dataset and interesting figures where MonetDB beat Infobright in most queries.

Used to the parallel IEE loader, I was surprised by the apparent slow loading speed of Baron's benchmark and decided to try and replicate it. I installed Infobright 3.2 on my laptop (see, this is very unscientific) and wrote a simple perl script to generate and load an arbitrarily large data set resembling Baron's description. I'm not going to post my exact numbers, because this installation is severely resource-constrained below Infobright's recommended smallest installation. However, you can reproduce the results yourself with the attached script, and I will note some observations.

Continue reading...

Monday 21 September 2009

A peek under the hood in Infobright 3.2 storage engine

I've been meaning to post some real-world data on the performance of the Infobright 3.2 release which happened a few weeks ago after an extended release candidate period. We're just preparing our upgrades now, so I don't have any performance notes over significant data sets or complicated queries to post quite yet.

To make up for that, I decided to address a particular annoyance of mine in the community edition, first because it hadn't been addressed in the 3.2 release (and really, I'm hoping doing this would include it into 3.2.1), and second, simply because the engine being open source means I can. I feel being OSS is one of Infobright's biggest strengths, in addition to being a pretty amazing piece of performance for such a simple, undemanding package in general, and not making use of that would be shame. Read on for details.

Continue reading...

Friday 28 August 2009

Why mobile computers are a bad idea

After my last night's posting, I had a small exchange with @moximilian and @jludwig about my claim that calling the N900 a computer is BS and nobody wants a computer. Somehow, Nokia has gone from calling the N-series devices "multimedia computers" a couple of years back to "mobile computers" today, but it's a totally horrible thing to do from a market positioning point of view. I suppose I should clarify my reasoning a bit.

This is how I imagine the thought process has gone: Nokia, an engineering-led manufacturer of fixed-function devices (phones) has had the ambition to "put the Internet in your pocket" for quite some time. So far, so good. Now, an engineer designs a brilliant package of a high-performance programmable microprocessor, significant amount of working memory and storage memory, and a rich set of input and output mechanisms. To an engineer, this fits the definition of a computer, thus it must be a computer.

However, that is not how the world at large sees computers. The general understanding of a computer is a device which requires constant management, is at risk from viruses and other malware, produces incomprehensible error messages and, despite being a window to the wonderful new world of Twitters, Facebooks and all kinds of information and entertainment, is best left alone when at all possible. Yes, the computer industry has made great progress in the last decades in making their produce more approachable and human-friendly, but it's not there yet. Apple, for all its faults, is generally regarded as the gold standard in "computers for the normal people". Yet who hasn't seen a Mac or even an iPhone (once loaded with applications, at least) bug out in the most bizarre of ways?

Computers don't have any built-in value of their own. The value is completely attached to the applications, services and solutions to which they provide access. If it was left at that, and calling something a "mobile computer" would be simply a bad choice of marketing titles, I wouldn't mind. However, as long as the engineers working on the future devices think it's desirable to think of them as computers, they will carry the problems I mentioned along to future devices, because that's what computers do.

The device in your pocket is a terminal, a window onto the services of the Global Computer, and a flexible access point to things no one has yet to invent. It is programmable, it does have memory, and it can compute. Even so, lets not call it a computer.

Thursday 27 August 2009

Reflections on Nokia Maemo

Earlier today Nokia announced their first handset based on what is likely to be their mobile operating system of the future - the Nokia N900 Maemo. I didn't think I would bother to pay attention, but somehow, I ended up doing so anyway, and this post is a result of that time spent thinking about it. I like quite a few things about it, but can't avoid being deeply bothered by other aspects. I hope by writing this I can make some small contribution to its future.

Why do I care? After the frustrations and disappointments with Nokia devices in the past years, I've tried not to. However, they're impossible to ignore in Finland, I have family reasons to hope this road leads to something good, and it's an attempt to make an open platform -- and I care about open platforms. Why do I feel qualified to comment? Well, because this is not really about devices, it's about software. And software is what I've always done, and managing software organizations is what I think about daily.

There's lot to like about the N900. I haven't seen, let alone played with one, but as far as the specs go, it's a pretty nice set of hardware. Same performance as the iPhone 3GS (which also makes it faster than any Android device announced), 3D acceleration, lots of storage (and a memory card slot), and, as a welcome change from many other Nokia devices, completely standard connectors (3.5mm audio, micro-USB tethering and battery charger). On the hardware side, the only thing not to like about it is the lack of a finger-usable, multi-touch display. This device, like all the other Nokia devices before it, require a stylus or at least long fingernails. It makes up for that by being really high resolution.

It's also based on an open source, Linux-based operating system Nokia has been developing for several years with community participation, Maemo. This makes it more attractive to me on a personal level than iPhone (which is way too closely guarded and controlled by Apple), Palm WebOS (open, but little track record), or even Android (open, but built out of pieces which have far less common with normal Linux than Maemo). It should be fairly clear that all four mentioned are way ahead of things like Symbian S60, which clearly needs to be taken behind the shed and put out of its misery, not matter what Nokia's representatives say about it official capacity.

On a more professional level, the inclusion of Flash 9.4 in the platform is a big deal. I'm anxious to get hold of one and see how much work it is to make Habbo work on it. This could be the first handset capable of technically running it (enough performance, enough resolution, good enough software), though obviously tuning our service to a mobile device would still need work on UI and other pieces.

However, like I wrote above, there's also plenty that bothers me. First of all, unlike most people probably realize, this is actually the 4th Maemo device in about as many years that Nokia releases. First was the Nokia 770 Internet Tablet, essentially an early adopter test device. Then came N800 -- running an updated OS which required applications to be ported to it, but which never was officially released for the 770 (putting the early adopter developers in a rather awkward position). Less than a year later N810 added a physical keyboard and an OS upgrade (which fortunately could be installed, with some difficulty, on the N800). That was quite a long time ago, though.

In the meantime, Maemo has been completely reinvented. The original UI toolkit has been switched to QT, which Nokia bought in the meantime, and all of the (rather limited quantity) applications require significant rework to be compatible with the OS release on the N900. The public reasoning for this compatibility break has been pretty weak -- "to ensure compatibility with S60", which also is moving on to QT framework. Why is this weak? Well, because the transition over on the S60 side also requires all of the (somewhat more numerous) applications developed for that platform to be significantly reworked. In other words, Nokia broke compatibility on both its old smartphone platform and the new platform at the same time, and offered little transitionary compatibility layers to either side. Not for the first time, either. S60 applications have been broken between upgrades several times before, too.

This track record is highly worrying. Despite their years of practice and ambitions to have a lively third party mobile applications market, Nokia has clearly not grasped the importance of a stable platform to the developers they mean to attract. This lack of understanding of one of the most basic requirements is enough to counter pretty much everything I wrote about Maemo versus its closest competitions a few chapters earlier.

Contrast the above to iPhone OS 2.0 to 3.0 transition. Sure, a few things did change. However, developers were given months of notice ahead of time, and the changes, apart from added functionality, were all pretty minor. Of course, Apple has a long history of making major upgrades while retaining forwards compatibility, with the Mac OS 68k to PowerPC, then to OS X, then to Intel CPU transitions.

It's also taken a LONG time for this device to be announced. I don't know, but I get the feeling it's something like a year late. The break in launch schedule between N810 and N900, the amount of changes in the Maemo platform, and the design of the device compared to for instance the N97 all scream "last year" to me. Besides, everyone knew this was coming ages ago. In the time between the launches of N810 and N900, Apple has managed to update the iPhone twice. This lack of predictability in the release cycle doesn't bode well for the next device in the line.

There is nothing more important for progress in software development than cycle time. The only cost-effective, productive way of making software today is to get feedback on it often, and the longer it stays unreleased, the more the feedback is late when it comes. This seems to be another area where Nokia has not been able to shake off their "we make hardware" mentality. Unline hardware, software can be updated with no extra cost. That's an advantage nearly everyone else has learned to make use of, and Nokia, if they truly desire to become a software and services powerhouse, has to finally take to heart.

N900 is not an "iPhone killer". I don't think it's meant to be. When its development started, it's unlikely the iPhone had even been announced. However, it's the best chance for Nokia to ever develop a device better than an iPhone. I hope they will - the world needs competition, and I would like to see Nokia be part in that. However, at this rate they will never catch up - Apple will have released two more major updates before the next Maemo device unless Nokia gets their act together.

I'm still hoping.

Thursday 16 July 2009

Excuse the downtime

Mea culpa, a small domain to fix email redirection update seems to have gone wrong and taken the blog offline. I didn't notice since I've been mostly offline myself, so thanks to the readers who notified me. If you can read this, the problem has been resolved.

Wednesday 27 May 2009

What we're looking for in a data integration tool

As our data warehousing process grows and the workflows get more complex, we've revisited the question of what tools to use in this process. Out of curiosity, I had a look at basing such a process on Hadoop/Hive for scalability reasons, but the lack of mature tools and the sacrifices on efficiency that would entail meant we're better off using something else as long as a distributed processing platform is the only thing that can get the job done. I'm also curious about the transition to continuous integration, a model I noticed showing up a couple of years ago and now getting some air under its wings as CEP, IBM's Infosphere Streams, and other similar approaches. Still, I think I'll continue to rely on something else for a while and see how things shake out. Continuous integration clearly is the future, but there are many ways to get there.

So, we had a look at what's going on in the Open Source data integration field. It seems the leaders in that field are Pentaho with Kettle/Pentaho Data Integration, and Talend with Open Studio and Talend Integration Suite. Both seem pretty even in terms of features. Both companies are a bit difficult to approach as a potential customer, so I figured I should also try what would come up from the OSS approach of just posting my thoughts on the Interweb ;)

Besides the technical pilot implementations we've made to compare basic workflow of the various tools, below is a sample of the kind of questions we're considering when evaluating the suitability of the tools.

Product roadmap, release schedule and size of the development team

  • How often and of what scope of changes should we expect and prepare ourselves for platform upgrades?
  • Past track record on keeping to a regular updates schedule

Data lineage and dependency, Impact analysis

  • How to find out which tables are being used to for deriving DWH dimensions and facts?

Logging, auditing, monitoring on row and job level

  • How to monitor and archive workflows on a row level (amount of rows being inserted/updated/deleted)?
  • How to maintain, access and query a job execution history (start time/end time/return code)?

Version control

  • How to track and restore changes in jobs?

Multi-user environment

  • How can several developers work together?

Change Data Capture

  • How to assist incremental loads?

Data profiling

  • How can data source be examined?

Job recovery

  • How to recover from possible failures in jobs (such as lost database connection)?

Deploy jobs

  • How to move jobs from one repository to another (development to testing to production)?

Sunday 24 May 2009

Hello, MySQL 6.0, err, something

I'm conflicted about the latest twist of the MySQL release saga, ie the announcement of the 6.0.11 alpha version and the accompanying note that it's the last 6.0 release and will be replaced by the already discussed milestone model. From an engineering point of view, I think this is the right step. I'm not sure about that, because I can't really tell exactly what is the engineering model chosen: trunk-first, then backport, or fix-in-releases, then forward port. I also can't tell whether the milestone model is going to be timeboxed or feature-scoped. Personally, I would prefer to see the former of both alternatives.

From a customer point of view, I'm even more confused, though much less concerned. Okay, so 6.0 won't become the marketing version number of any MySQL Enterprise release? Doesn't matter. 5.4 needs to come out first anyway, preferably sooner with a concrete, well-tested feature set, than later with more planned-but-unfinished features stuffed in it. What the release after that is going to be called makes no difference to me, as long as it's also going to contain solid improvements and comes out on predictable schedule that doesn't force me to look for something drastically different in order to deal with scale.

That being said, it's still weird. So if the thought of 6.0 GA release is scrapped, why release anything and still call it 6.0? I guess it's just tying loose ends, but that's an engineering thing, and only the number of existing source branches with stuff to merge together matter, not the version number put to it...

Tuesday 12 May 2009

Confusing Sun communication about MySQL 5.4

Just received an email newsletter from Sun titled "MySQL 5.4 Preview Release" which states:

Sun Microsystems recently released MySQL 5.4, delivering performance and scalability improvements enabling the InnoDB storage engine to scale up to 16-way x86 servers and 64-way CMT servers.

MySQL 5.4 also includes new subquery optimizations and JOIN improvements, resulting in 90% better response times for certain queries.

Apparently, the confusion about the contents of the release I wrote about earlier continue to reign inside Sun as well. MySQL 5.4 has not been released by any reasonable meaning of the word, since there's "only" a preview available at this time. Compare this to Windows 7: that's already a Release Candidate, but it has not been released. Also, the preview release available does not include new subquery optimizations nor JOIN improvements. Having planned such improvements doesn't count.

As I wrote earlier, the best of the rather bad excuses for the release labeling offered to me was that Sun wanted to avoid confusion by not releasing many versions at once. I think that got replaced (and then some) by plenty of extra confusion about when and what was released, instead. Sorry, no good. Try again, 'kthxbye.

Monday 4 May 2009

What does Oracle mean for Java?

Over the past two weeks I've been mostly focused on MySQL, but the big-ticket item in the Sun/Oracle deal is not databases, it's Java. However, it's also the domain which is far less clear to predict. It was a big deal when Sun decided to open source Java, but the fact of the matter is that the first fully open source release isn't out yet, and Sun has been keeping the testing and certification kit off-limits for open source communities. This means it would still be far too easy for OpenJDK to be killed off.

I've been keeping clear of Oracle for several years, and can't even begin to guess what their position on this is. Oracle has been a pretty active contributor to Linux in particular for several years, and I'm sure their open source strategy and how it works together with their business is pretty well established within at least the engineering parts of the company. At the same time, their notoriously aggressive market tactics make sure that everyone's wary of their next move. Java is a huge part of Oracle's business, and after they purchased BEA, I wouldn't be surprised if Oracle wasn't already the biggest Java company (in terms of revenue) ahead of both Sun and IBM. After completing the Sun acquisition, that'll be guaranteed.

That's a big balance shift for the overall Java community. Now, Oracle is a smart company. My worry is they might emphasize short-term tactical market advantage (owning all of Java, JRockit, Glassfish and WebLogic to compete against other middleware and business applications) over long-term strategic benefit of a unified platform competing with .NET and the host of open source platforms from PHP and Ruby to Python. With such a wide field, following up on, and improving on the open source platform process would be the right thing to do - and it would help me :)

Thursday 30 April 2009

The difference between conversion and retention

Picked up a piece of analysis today from my newsfeed regarding Twitter audience. Nielsen has posted information about Twitter's month-to-month retention (40%) and compared that to Facebook's and MySpace's. Pete Cashmore over at Mashable promptly misread the basic information and came to an entirely wrong conclusion about the stats, titling his post about it as "60% quit Twitter in the first month". A simple misunderstanding of basic audience analysis like this is the crucial difference between explosively growing traffic and a failure. That's a fail for you, Pete.

What's wrong? Well, retention is a separate matter from conversion. 40% conversion from a trial registration to being a continuing active user to the second month would not be a bad conversion rate. It's not stratospherically great, I've seen better, but I wouldn't be terribly unhappy about such a figure. However, Nielsen didn't say anything at all about first-to-second month conversion. This is what they DID say: "Twitter’s audience retention rate, or the percentage of a given month’s users who come back the following month, is currently about 40 percent."

That's pretty plain English when you take the time to read it. Month to month, regardless of visitor lifetime, not first to second month. On this metric, 40% retention is not good at all, and will definitely be a limiting factor to Twitter's traffic and audience size over time, just the Nielsen article points out (and shows the math for). For any given retention rate, there just is a certain maximum audience reach beyond which any new traffic can't overcome the leaving base, since new traffic is not an inexhaustible supply.

And since today is a busy day, that concludes the free startup advice. Take the time to understand the difference between these metrics, you'll thank yourself for it later.

- page 2 of 21 -