Fishpool

To content | To menu | To search

Tag - MySQL

Entries feed - Comments feed

Thursday 13 January 2011

A last look at 2010... and what's in sight?

For a few years, I've tried to recap here some events I've found notable over the past year and offering some guesses on what might be ahead of us. I'm somewhat late on these things this year, due to being busy with other stuff, but I didn't want to break the tradition, no matter how silly my wrong guesses might seem later. And again, others have covered generals, so I'll try to focus on specifics, in particular as they relate to what I do. For a look at what we achieved for Habbo, see my recap post on the Sulake blog.

This time last year Oracle still had not successfully completed the Sun acquisition due to some EC silliness, but that finally happened over the 2010. It seems to be playing about how I expected it to - MySQL releases have started to appear (instead of just being announced, which was mostly what MySQL AB and Sun were doing), and they actually are improvements. Most things are good on that front. On the other hand, Oracle is exerting license force on the Java front, and hurting Java's long-term prospects in the process, just at a time when things like Ruby and Node.js should put the Java community on the move to improve the platform. Instead, it looks like people are beginning to jump ship, and I can't blame them.

A couple of things surprised me in 2010. Nokia finally hired a non-Finn as a CEO, and Microsoft's Kinect actually works. I did mention camera-based gesture UIs in my big predictions post, but frankly I wasn't expecting it to actually happen during 2010. Okay, despite the 8 million units, computer vision UIs aren't a general-purpose mass market thing yet, but the real kicker here is how easy Kinect is to use for homebrew software. We're going to see some amazing prototypes and one or two actual products this year, I'm sure.

In terms of other software platform stuff, much hot air has been moved around iOS, Android, JavaScript and Flash. I haven't seen much that would have made me think it'd be time to reposition yet. Native applications are on their way out (never mind Mac App Store, it's a last-hurrah thing for apps which don't have an Internet service behind them), and browser-based stuff is on its way in. Flash is still the best browser-side applications platform for really rich stuff, and while JavaScript/HTML5/Canvas is coming, it's not here yet. For more, see this thread on Quora where I commented on the same. Much of the world seems to think that HTML5 Video tag, h.264 and VP8 equate to the capabilities of Flash, that's quite off-base.

On the other hand, tablets are very much the thing. I very much expect that my Galaxy Tab will be outdated by next month, and am looking forward to the dual-core versions which probably will be good for much, much more than email, calendar, web and the occasional game. Not that I'm not already happy about what's possible on the current tablets -- I carry a laptop around much less already. An in terms of what it means for software -- UI's are ripe for a radical evolution. 

The combination of direct touch on handheld devices and camera-read gestures on living-room devices is already here, and I expect both to shift on to the desktop as well. Not by replacing keyboards, nor necessarily mouses, but I'm looking forward to soon having a desktop made out of a large near-horizontal touchscreen for arranging stuff replacing the desk itself, a couple of large vertical displays for presenting information, a camera vision for helping the computer read my intentions and focus on stuff, and keeping the keyboard around for rapid data entry. One has to remember that things for which fingers are enough are much more efficiently done with fingers than by waving the entire hand around.. 

Will I have such a desk this year? Probably not. At the workplace, I move around so much that a tablet is more useful, and at home, time in front of a desktop computer grew rather more infrequent with the arrival of our little baby girl a few weeks ago.. But those are what I want "a computer" to mean to her, not these clunky limited things my generation is used to.

Monday 4 January 2010

Happy 2010 - it's review time

I was happily snowboarding and skiing (the latter for the first time in two decades) last week, so here comes the year-end review a week late. Last year, I harped on Facebook's closed nature, and over the the year they've tried to open more of the users' data over to the Internet. Still, there are no decent APIs for a user to pull out everything they've posted to Facebook to have their own copy, though. That doesn't seem to stop them from dominating the Internet for the time being, though, so good for them.

I'm trying to think of what would have surprised me over the year, but given I failed to make many accurate predictions myself, things just seemed to happen in pretty natural direction. Oracle's Sun acquisition over in April was a bit of a surprise at the time, but since then, I've grown to appreciate how it might make sense for Oracle. However, what still baffles me is that EC is going along with Monty's campaign of blocking the completion of that acquisition. Look, guys - the entire world does not need to agree on a commercial transaction in order for one to go through! MySQL is not the important thing here overall, Java is.

We managed to complete a few of major transitions for Habbo, most notably replacing the Shockwave client which was getting a bit long in the tooth with an all-new Flash-based Habbo Hotel and integrating Habbo with Facebook and other social networks. I didn't write about either of those launches here at the time, but these are pretty huge things for us because they make approaching Habbo much easier for a new user, and enable us to create all kinds of interesting features that would not have made sense previously.

So, what do I expect from 2010? Well, did the mobile Internet already happen? If not, at least it has a fighting chance this year. I'm having a hard time identifying any people close to me who're not using some Internet services on their phone by now, and some seem to be doing that almost exclusively on a phone. That must mean the rest of the world is close on their heels. As for more predictions, others have taken care of them by now.

One promise I can make is to try to do my part in making the Internet more fun and more social. At least now that even newspapers are beginning to think that asking their readers for money is not just a utopia, we can focus on the apps themselves, not whether they're ad-supportable.

Have a great year MMX!

Wednesday 11 November 2009

MySQL - could we please move on already?

I've kept away from this debate since last April, but this eternal dragging-on is getting to me. Could we please move on already regarding the Oracle-Sun-MySQL decision? I'm a customer of MySQL, and I don't really savor the idea of becoming a customer of Oracle. Even so, I'd much rather see Oracle own it, than leave it straggling, let alone see this process drag on and on. This is helping no one.

I'm using a product from a company from which I buy commercial support, but I could switch to using a binary-compatible Open Source tool any day I chose. I am not bound to remaining a customer of the company I'm buying support from for any period longer than the current contract. I can definitely live with that obligation. I can live with the OSS-tool (whether we want to call it MySQL Community, Percona, MariaDB or whatever, I don't care) instead of the commercial product - in fact, I'm getting the understanding that the OSS-tool may in fact be better suited to my requirements than the product. So, I have no issue being bound Oracle, should the merger go through, because I am not bound to them. I can see as much interesting related technology being developed outside the discussed commercial unit as inside it, so I'm certainly not worried about the future of the tech.

At this point in time, I could buy support from at least a couple of different organizations to replace and extend that which I've bought from MySQL/Sun. I have absolutely no reason to think that option would go away should the merger be approved, despite what certain founders now claim. If it's not commercially possible to develop and support a database product without being in full control over its copyright, then how come Percona has a business? If it's possible to provide such support for GPL software on a limited basis, but not on a big-business enterprise level, then how come Red Hat is a successful public company?

I use MySQL as an infrastructure component to run a business which could be described as software-as-a-service. I do not redistribute the code base as part of a licensed product. There are companies who do that, but they've always done it with the full understanding that what they're doing is dependent on having to license something from an independent party over which they have no control. If they don't like licensing from Oracle, then they can choose to re-engineer their solution to work on top of some other database engine. It's not like those don't exist, or like technology, licensed or not, hasn't always carried that risk with it.

I can't avoid thinking that some of the parties keeping this thing from reaching completion are dreaming of Skype -- selling the same business twice. Hey, more power to them if that happens, but frankly, that was dependent on Ebay making a stupid deal at the time. I just do not see what that has to do with anti-trust and why the European Commission needs to be involved. THIS is hurting the market, more so that Oracle is likely to.

I have nothing further on the matter. Thank you for your attention.

Saturday 3 October 2009

Some scaling observations on Infobright

A couple of days ago, Baron Schwartz posted some simple load and select benchmarking of MyISAM, Infobright and MonetDB, which Vadim Tkachenko followed up with a more realistic dataset and interesting figures where MonetDB beat Infobright in most queries.

Used to the parallel IEE loader, I was surprised by the apparent slow loading speed of Baron's benchmark and decided to try and replicate it. I installed Infobright 3.2 on my laptop (see, this is very unscientific) and wrote a simple perl script to generate and load an arbitrarily large data set resembling Baron's description. I'm not going to post my exact numbers, because this installation is severely resource-constrained below Infobright's recommended smallest installation. However, you can reproduce the results yourself with the attached script, and I will note some observations.

Continue reading...

Monday 21 September 2009

A peek under the hood in Infobright 3.2 storage engine

I've been meaning to post some real-world data on the performance of the Infobright 3.2 release which happened a few weeks ago after an extended release candidate period. We're just preparing our upgrades now, so I don't have any performance notes over significant data sets or complicated queries to post quite yet.

To make up for that, I decided to address a particular annoyance of mine in the community edition, first because it hadn't been addressed in the 3.2 release (and really, I'm hoping doing this would include it into 3.2.1), and second, simply because the engine being open source means I can. I feel being OSS is one of Infobright's biggest strengths, in addition to being a pretty amazing piece of performance for such a simple, undemanding package in general, and not making use of that would be shame. Read on for details.

Continue reading...

Wednesday 27 May 2009

What we're looking for in a data integration tool

As our data warehousing process grows and the workflows get more complex, we've revisited the question of what tools to use in this process. Out of curiosity, I had a look at basing such a process on Hadoop/Hive for scalability reasons, but the lack of mature tools and the sacrifices on efficiency that would entail meant we're better off using something else as long as a distributed processing platform is the only thing that can get the job done. I'm also curious about the transition to continuous integration, a model I noticed showing up a couple of years ago and now getting some air under its wings as CEP, IBM's Infosphere Streams, and other similar approaches. Still, I think I'll continue to rely on something else for a while and see how things shake out. Continuous integration clearly is the future, but there are many ways to get there.

So, we had a look at what's going on in the Open Source data integration field. It seems the leaders in that field are Pentaho with Kettle/Pentaho Data Integration, and Talend with Open Studio and Talend Integration Suite. Both seem pretty even in terms of features. Both companies are a bit difficult to approach as a potential customer, so I figured I should also try what would come up from the OSS approach of just posting my thoughts on the Interweb ;)

Besides the technical pilot implementations we've made to compare basic workflow of the various tools, below is a sample of the kind of questions we're considering when evaluating the suitability of the tools.

Product roadmap, release schedule and size of the development team

  • How often and of what scope of changes should we expect and prepare ourselves for platform upgrades?
  • Past track record on keeping to a regular updates schedule

Data lineage and dependency, Impact analysis

  • How to find out which tables are being used to for deriving DWH dimensions and facts?

Logging, auditing, monitoring on row and job level

  • How to monitor and archive workflows on a row level (amount of rows being inserted/updated/deleted)?
  • How to maintain, access and query a job execution history (start time/end time/return code)?

Version control

  • How to track and restore changes in jobs?

Multi-user environment

  • How can several developers work together?

Change Data Capture

  • How to assist incremental loads?

Data profiling

  • How can data source be examined?

Job recovery

  • How to recover from possible failures in jobs (such as lost database connection)?

Deploy jobs

  • How to move jobs from one repository to another (development to testing to production)?

Sunday 24 May 2009

Hello, MySQL 6.0, err, something

I'm conflicted about the latest twist of the MySQL release saga, ie the announcement of the 6.0.11 alpha version and the accompanying note that it's the last 6.0 release and will be replaced by the already discussed milestone model. From an engineering point of view, I think this is the right step. I'm not sure about that, because I can't really tell exactly what is the engineering model chosen: trunk-first, then backport, or fix-in-releases, then forward port. I also can't tell whether the milestone model is going to be timeboxed or feature-scoped. Personally, I would prefer to see the former of both alternatives.

From a customer point of view, I'm even more confused, though much less concerned. Okay, so 6.0 won't become the marketing version number of any MySQL Enterprise release? Doesn't matter. 5.4 needs to come out first anyway, preferably sooner with a concrete, well-tested feature set, than later with more planned-but-unfinished features stuffed in it. What the release after that is going to be called makes no difference to me, as long as it's also going to contain solid improvements and comes out on predictable schedule that doesn't force me to look for something drastically different in order to deal with scale.

That being said, it's still weird. So if the thought of 6.0 GA release is scrapped, why release anything and still call it 6.0? I guess it's just tying loose ends, but that's an engineering thing, and only the number of existing source branches with stuff to merge together matter, not the version number put to it...

Tuesday 12 May 2009

Confusing Sun communication about MySQL 5.4

Just received an email newsletter from Sun titled "MySQL 5.4 Preview Release" which states:

Sun Microsystems recently released MySQL 5.4, delivering performance and scalability improvements enabling the InnoDB storage engine to scale up to 16-way x86 servers and 64-way CMT servers.

MySQL 5.4 also includes new subquery optimizations and JOIN improvements, resulting in 90% better response times for certain queries.

Apparently, the confusion about the contents of the release I wrote about earlier continue to reign inside Sun as well. MySQL 5.4 has not been released by any reasonable meaning of the word, since there's "only" a preview available at this time. Compare this to Windows 7: that's already a Release Candidate, but it has not been released. Also, the preview release available does not include new subquery optimizations nor JOIN improvements. Having planned such improvements doesn't count.

As I wrote earlier, the best of the rather bad excuses for the release labeling offered to me was that Sun wanted to avoid confusion by not releasing many versions at once. I think that got replaced (and then some) by plenty of extra confusion about when and what was released, instead. Sorry, no good. Try again, 'kthxbye.

Monday 4 May 2009

What does Oracle mean for Java?

Over the past two weeks I've been mostly focused on MySQL, but the big-ticket item in the Sun/Oracle deal is not databases, it's Java. However, it's also the domain which is far less clear to predict. It was a big deal when Sun decided to open source Java, but the fact of the matter is that the first fully open source release isn't out yet, and Sun has been keeping the testing and certification kit off-limits for open source communities. This means it would still be far too easy for OpenJDK to be killed off.

I've been keeping clear of Oracle for several years, and can't even begin to guess what their position on this is. Oracle has been a pretty active contributor to Linux in particular for several years, and I'm sure their open source strategy and how it works together with their business is pretty well established within at least the engineering parts of the company. At the same time, their notoriously aggressive market tactics make sure that everyone's wary of their next move. Java is a huge part of Oracle's business, and after they purchased BEA, I wouldn't be surprised if Oracle wasn't already the biggest Java company (in terms of revenue) ahead of both Sun and IBM. After completing the Sun acquisition, that'll be guaranteed.

That's a big balance shift for the overall Java community. Now, Oracle is a smart company. My worry is they might emphasize short-term tactical market advantage (owning all of Java, JRockit, Glassfish and WebLogic to compete against other middleware and business applications) over long-term strategic benefit of a unified platform competing with .NET and the host of open source platforms from PHP and Ruby to Python. With such a wide field, following up on, and improving on the open source platform process would be the right thing to do - and it would help me :)

Tuesday 28 April 2009

The MySQL community outlook

While I can not consider myself a member of MySQL's community of developers, I've been watching those developments the same way I follow the development of Linux and many of the Java and Apache projects our own services depend on. It was great to meet many of the core members of the development community and get some insight into their thoughts about the future.

Baron Schwartz called in his Percona Performance Conference keynote on Thursday for a new, active MySQL community to take the driver's seat in the development of the database, not just in the incremental improvements way of bug fixing and performance improvement, but also by setting a vision for the next generation MySQL. It's a call to action greatly needed, and an important one despite the active existence of the Drizzle project. This is because while Drizzle already has a vision for the future, it's a radical diversion for the MySQL userbase and one which will not necessarily have smooth upgrade path. Many of the same MySQL users feeling most of the pain of MySQL's current limitations are also those who will not be able to easily upgrade to a radically different architecture due to the amount of data and dependencies in their existing infrastructure.

It's a gap which needs a careful approach of incremental changes to the MySQL base functionality to help users bridge over to a new, brighter future. These changes do not need to be slow. Rapid incremental changes are likely to be easier to digest with a clear upgrade and downgrade path from iteration to iteration leaving the organizations with biggest infrastructures to consider a way to set their own pace through the transition, rather than being forced to take one huge leap and risk a crash to the concrete wall of unexpected incompatibility.

A few such pieces of incremental community improvements I learned a great deal of during the week were the performance and scalability improvements by Google and Percona and their MySQL 5.4 equivalents, the Xtrabackup utility not only as an alternative, but improvement on the Innobackup tool which has significant limitations to its use in large-scale deployments, and the Tungsten Replicator providing useful cross-database replication and rapid failover features helping upgrades and transitions to new database installations while minimizing downtime and impact to users. I'm also curious about the storage engine development by Primebase - I don't think there's ultimately a lot of room for multiple transactional storage engines, but as a competitive research topic, it's certainly good to see alternatives to InnoDB.

[Be sure to check out my earlier posts of the conference learnings as well!]

- page 1 of 4