Fishpool

To content | To menu | To search

Tag - open source

Entries feed - Comments feed

Monday 23 November 2009

Notes about Fedora 12

Another six months, another Fedora release. Apparently I still couldn't resist the temptation of upgrading, given I got a few days of flu-related downtime. Happy to report it's a pretty smooth release, with most things in the expected places:

  • GNOME is a tiny bit cleaner than it used to be, which is as expected, given that's what it's been doing for the last 5 releases. Apparently next time it'll be something completely different. I don't know if I should be excited or apprehensive about that..
  • PulseAudio continues to improve - however, I could swear I've successfully used a Bluetooth headset with Skype earlier, and now audio gets stuck if I pair a headset. That's not the most typical use case, of course, and for the most part, audio no longer sucks on Linux. Too bad my laptop's built-in microphone does suck (don't know if that's with Linux or in general), so I do need a headset to make Skype calls.
  • Apparently Empathy is approaching a usable IM now that it's made the default. Still slightly prematurely, IMO, and I will continue to use Pidgin with all its warts for the time being.
  • OpenOffice still works as expected, which is to say, slowly, but reasonably predicably.
  • I can get rid of many of the hacks I've done to make multihead work as I like without setting it up every time, because now Xorg does that by default. Yippee!
  • Evolution still continues to gain one or two major regressions per release, and lose none of the earlier. The tally now seems to be: brken live search, fkdup IMAP sync, scrwy calendaring, and, as an additional feature, automatically selecting the wrong recipient address out of several available emails despite being repeatedly told otherwise. Seriously, the thing needs to be taken behind the shed and shot to the head. And I need to find a decent email program. Thunderbird 2 wasn't that - and 3 still isn't done. Sigh.
  • Google Chromium is about 10x faster than Firefox, and by far the easiest way to install a 32 bit browser (working Flash!) on a 64 bit OS (I should probably reinstall to 32 bits all around, this bits thing doesn't help me do anything better).
That concludes my "yes, I'm a Linux geek" postings for the next six months, I guess. :)

Saturday 3 October 2009

Some scaling observations on Infobright

A couple of days ago, Baron Schwartz posted some simple load and select benchmarking of MyISAM, Infobright and MonetDB, which Vadim Tkachenko followed up with a more realistic dataset and interesting figures where MonetDB beat Infobright in most queries.

Used to the parallel IEE loader, I was surprised by the apparent slow loading speed of Baron's benchmark and decided to try and replicate it. I installed Infobright 3.2 on my laptop (see, this is very unscientific) and wrote a simple perl script to generate and load an arbitrarily large data set resembling Baron's description. I'm not going to post my exact numbers, because this installation is severely resource-constrained below Infobright's recommended smallest installation. However, you can reproduce the results yourself with the attached script, and I will note some observations.

Continue reading...

Monday 21 September 2009

A peek under the hood in Infobright 3.2 storage engine

I've been meaning to post some real-world data on the performance of the Infobright 3.2 release which happened a few weeks ago after an extended release candidate period. We're just preparing our upgrades now, so I don't have any performance notes over significant data sets or complicated queries to post quite yet.

To make up for that, I decided to address a particular annoyance of mine in the community edition, first because it hadn't been addressed in the 3.2 release (and really, I'm hoping doing this would include it into 3.2.1), and second, simply because the engine being open source means I can. I feel being OSS is one of Infobright's biggest strengths, in addition to being a pretty amazing piece of performance for such a simple, undemanding package in general, and not making use of that would be shame. Read on for details.

Continue reading...

Thursday 27 August 2009

Reflections on Nokia Maemo

Earlier today Nokia announced their first handset based on what is likely to be their mobile operating system of the future - the Nokia N900 Maemo. I didn't think I would bother to pay attention, but somehow, I ended up doing so anyway, and this post is a result of that time spent thinking about it. I like quite a few things about it, but can't avoid being deeply bothered by other aspects. I hope by writing this I can make some small contribution to its future.

Why do I care? After the frustrations and disappointments with Nokia devices in the past years, I've tried not to. However, they're impossible to ignore in Finland, I have family reasons to hope this road leads to something good, and it's an attempt to make an open platform -- and I care about open platforms. Why do I feel qualified to comment? Well, because this is not really about devices, it's about software. And software is what I've always done, and managing software organizations is what I think about daily.

There's lot to like about the N900. I haven't seen, let alone played with one, but as far as the specs go, it's a pretty nice set of hardware. Same performance as the iPhone 3GS (which also makes it faster than any Android device announced), 3D acceleration, lots of storage (and a memory card slot), and, as a welcome change from many other Nokia devices, completely standard connectors (3.5mm audio, micro-USB tethering and battery charger). On the hardware side, the only thing not to like about it is the lack of a finger-usable, multi-touch display. This device, like all the other Nokia devices before it, require a stylus or at least long fingernails. It makes up for that by being really high resolution.

It's also based on an open source, Linux-based operating system Nokia has been developing for several years with community participation, Maemo. This makes it more attractive to me on a personal level than iPhone (which is way too closely guarded and controlled by Apple), Palm WebOS (open, but little track record), or even Android (open, but built out of pieces which have far less common with normal Linux than Maemo). It should be fairly clear that all four mentioned are way ahead of things like Symbian S60, which clearly needs to be taken behind the shed and put out of its misery, not matter what Nokia's representatives say about it official capacity.

On a more professional level, the inclusion of Flash 9.4 in the platform is a big deal. I'm anxious to get hold of one and see how much work it is to make Habbo work on it. This could be the first handset capable of technically running it (enough performance, enough resolution, good enough software), though obviously tuning our service to a mobile device would still need work on UI and other pieces.

However, like I wrote above, there's also plenty that bothers me. First of all, unlike most people probably realize, this is actually the 4th Maemo device in about as many years that Nokia releases. First was the Nokia 770 Internet Tablet, essentially an early adopter test device. Then came N800 -- running an updated OS which required applications to be ported to it, but which never was officially released for the 770 (putting the early adopter developers in a rather awkward position). Less than a year later N810 added a physical keyboard and an OS upgrade (which fortunately could be installed, with some difficulty, on the N800). That was quite a long time ago, though.

In the meantime, Maemo has been completely reinvented. The original UI toolkit has been switched to QT, which Nokia bought in the meantime, and all of the (rather limited quantity) applications require significant rework to be compatible with the OS release on the N900. The public reasoning for this compatibility break has been pretty weak -- "to ensure compatibility with S60", which also is moving on to QT framework. Why is this weak? Well, because the transition over on the S60 side also requires all of the (somewhat more numerous) applications developed for that platform to be significantly reworked. In other words, Nokia broke compatibility on both its old smartphone platform and the new platform at the same time, and offered little transitionary compatibility layers to either side. Not for the first time, either. S60 applications have been broken between upgrades several times before, too.

This track record is highly worrying. Despite their years of practice and ambitions to have a lively third party mobile applications market, Nokia has clearly not grasped the importance of a stable platform to the developers they mean to attract. This lack of understanding of one of the most basic requirements is enough to counter pretty much everything I wrote about Maemo versus its closest competitions a few chapters earlier.

Contrast the above to iPhone OS 2.0 to 3.0 transition. Sure, a few things did change. However, developers were given months of notice ahead of time, and the changes, apart from added functionality, were all pretty minor. Of course, Apple has a long history of making major upgrades while retaining forwards compatibility, with the Mac OS 68k to PowerPC, then to OS X, then to Intel CPU transitions.

It's also taken a LONG time for this device to be announced. I don't know, but I get the feeling it's something like a year late. The break in launch schedule between N810 and N900, the amount of changes in the Maemo platform, and the design of the device compared to for instance the N97 all scream "last year" to me. Besides, everyone knew this was coming ages ago. In the time between the launches of N810 and N900, Apple has managed to update the iPhone twice. This lack of predictability in the release cycle doesn't bode well for the next device in the line.

There is nothing more important for progress in software development than cycle time. The only cost-effective, productive way of making software today is to get feedback on it often, and the longer it stays unreleased, the more the feedback is late when it comes. This seems to be another area where Nokia has not been able to shake off their "we make hardware" mentality. Unline hardware, software can be updated with no extra cost. That's an advantage nearly everyone else has learned to make use of, and Nokia, if they truly desire to become a software and services powerhouse, has to finally take to heart.

N900 is not an "iPhone killer". I don't think it's meant to be. When its development started, it's unlikely the iPhone had even been announced. However, it's the best chance for Nokia to ever develop a device better than an iPhone. I hope they will - the world needs competition, and I would like to see Nokia be part in that. However, at this rate they will never catch up - Apple will have released two more major updates before the next Maemo device unless Nokia gets their act together.

I'm still hoping.

Wednesday 27 May 2009

What we're looking for in a data integration tool

As our data warehousing process grows and the workflows get more complex, we've revisited the question of what tools to use in this process. Out of curiosity, I had a look at basing such a process on Hadoop/Hive for scalability reasons, but the lack of mature tools and the sacrifices on efficiency that would entail meant we're better off using something else as long as a distributed processing platform is the only thing that can get the job done. I'm also curious about the transition to continuous integration, a model I noticed showing up a couple of years ago and now getting some air under its wings as CEP, IBM's Infosphere Streams, and other similar approaches. Still, I think I'll continue to rely on something else for a while and see how things shake out. Continuous integration clearly is the future, but there are many ways to get there.

So, we had a look at what's going on in the Open Source data integration field. It seems the leaders in that field are Pentaho with Kettle/Pentaho Data Integration, and Talend with Open Studio and Talend Integration Suite. Both seem pretty even in terms of features. Both companies are a bit difficult to approach as a potential customer, so I figured I should also try what would come up from the OSS approach of just posting my thoughts on the Interweb ;)

Besides the technical pilot implementations we've made to compare basic workflow of the various tools, below is a sample of the kind of questions we're considering when evaluating the suitability of the tools.

Product roadmap, release schedule and size of the development team

  • How often and of what scope of changes should we expect and prepare ourselves for platform upgrades?
  • Past track record on keeping to a regular updates schedule

Data lineage and dependency, Impact analysis

  • How to find out which tables are being used to for deriving DWH dimensions and facts?

Logging, auditing, monitoring on row and job level

  • How to monitor and archive workflows on a row level (amount of rows being inserted/updated/deleted)?
  • How to maintain, access and query a job execution history (start time/end time/return code)?

Version control

  • How to track and restore changes in jobs?

Multi-user environment

  • How can several developers work together?

Change Data Capture

  • How to assist incremental loads?

Data profiling

  • How can data source be examined?

Job recovery

  • How to recover from possible failures in jobs (such as lost database connection)?

Deploy jobs

  • How to move jobs from one repository to another (development to testing to production)?

Monday 4 May 2009

What does Oracle mean for Java?

Over the past two weeks I've been mostly focused on MySQL, but the big-ticket item in the Sun/Oracle deal is not databases, it's Java. However, it's also the domain which is far less clear to predict. It was a big deal when Sun decided to open source Java, but the fact of the matter is that the first fully open source release isn't out yet, and Sun has been keeping the testing and certification kit off-limits for open source communities. This means it would still be far too easy for OpenJDK to be killed off.

I've been keeping clear of Oracle for several years, and can't even begin to guess what their position on this is. Oracle has been a pretty active contributor to Linux in particular for several years, and I'm sure their open source strategy and how it works together with their business is pretty well established within at least the engineering parts of the company. At the same time, their notoriously aggressive market tactics make sure that everyone's wary of their next move. Java is a huge part of Oracle's business, and after they purchased BEA, I wouldn't be surprised if Oracle wasn't already the biggest Java company (in terms of revenue) ahead of both Sun and IBM. After completing the Sun acquisition, that'll be guaranteed.

That's a big balance shift for the overall Java community. Now, Oracle is a smart company. My worry is they might emphasize short-term tactical market advantage (owning all of Java, JRockit, Glassfish and WebLogic to compete against other middleware and business applications) over long-term strategic benefit of a unified platform competing with .NET and the host of open source platforms from PHP and Ruby to Python. With such a wide field, following up on, and improving on the open source platform process would be the right thing to do - and it would help me :)

Tuesday 28 April 2009

The MySQL community outlook

While I can not consider myself a member of MySQL's community of developers, I've been watching those developments the same way I follow the development of Linux and many of the Java and Apache projects our own services depend on. It was great to meet many of the core members of the development community and get some insight into their thoughts about the future.

Baron Schwartz called in his Percona Performance Conference keynote on Thursday for a new, active MySQL community to take the driver's seat in the development of the database, not just in the incremental improvements way of bug fixing and performance improvement, but also by setting a vision for the next generation MySQL. It's a call to action greatly needed, and an important one despite the active existence of the Drizzle project. This is because while Drizzle already has a vision for the future, it's a radical diversion for the MySQL userbase and one which will not necessarily have smooth upgrade path. Many of the same MySQL users feeling most of the pain of MySQL's current limitations are also those who will not be able to easily upgrade to a radically different architecture due to the amount of data and dependencies in their existing infrastructure.

It's a gap which needs a careful approach of incremental changes to the MySQL base functionality to help users bridge over to a new, brighter future. These changes do not need to be slow. Rapid incremental changes are likely to be easier to digest with a clear upgrade and downgrade path from iteration to iteration leaving the organizations with biggest infrastructures to consider a way to set their own pace through the transition, rather than being forced to take one huge leap and risk a crash to the concrete wall of unexpected incompatibility.

A few such pieces of incremental community improvements I learned a great deal of during the week were the performance and scalability improvements by Google and Percona and their MySQL 5.4 equivalents, the Xtrabackup utility not only as an alternative, but improvement on the Innobackup tool which has significant limitations to its use in large-scale deployments, and the Tungsten Replicator providing useful cross-database replication and rapid failover features helping upgrades and transitions to new database installations while minimizing downtime and impact to users. I'm also curious about the storage engine development by Primebase - I don't think there's ultimately a lot of room for multiple transactional storage engines, but as a competitive research topic, it's certainly good to see alternatives to InnoDB.

[Be sure to check out my earlier posts of the conference learnings as well!]

Friday 21 September 2007

MySQL Community vs Enterprise tension

I probably don't spend quite enough time following progress around MySQL considering how critical the product is to us. I'd like to consider it part of the infrastructure in a way I treat Red Hat Enterprise Linux, ie something I can trust to make good progress and follow up on a quarterly basis. Naturally we have people who watch both much more closely, but my time simply should, and pretty much is, spent doing something else.

However, it seems MySQL really demands a bit more attention right now. Today I went and read Jeremy Cole's opinion about MySQL Community (a failure), and I have to say I agree on many of the points. MySQL simply has not yet found a model that works as well as that of Red Hat's Fedora vs Enterprise Linux - that is, really giving the Community edition to the community to direct, and using the Enterprise edition as a platform for enterprises to depend on.

I feel the fundamental problem really is quite simple; as long as MySQL maintains the community edition (both binaries AND the source tree) themselves, and don't let the community integrate features to it on a timely basis, the model will not function, not even to their paying customers (us included). However, if they reverse this particular point from the current status-quo, all of the other benefits are inevitable.

The comparison to Fedora and RHEL is rather obvious, despite the distribution vs single product differences. Fedora is a great community Linux distribution with the latest-and-greatest features integrated to it on a very timely fashion. Not even Ubuntu can really compete with Fedora in terms of features. However, what Fedora gives up to reach this is a certain amount of polish and reliability. I will happily use Fedora as a personal platform, because of the latest features, but I would not pretend to run a stable system on top of it. For that, I'll rather choose something a bit more mature, that has proven itself in the community and received further QA ahead of commercial release. This is RHEL, and this is what the MySQL Enterprise should be. A version that, when it's released, I shouldn't have to hesitate to install on a new production server.

I also today learned about the Dorsal Source MySQL community release. Now this looks like something MySQL Community release probably should be like. I'll have to give it a test round and see what's up.

Update: Baron Schwartz describes a MySQL Enterprise that I would have far less trouble using than the existing one..