To content | To menu | To search

Tag - Performance

Entries feed - Comments feed

Thursday 18 July 2013

The difference in being demanding and acting like a jerk

Sarah Sharp is a member of a very rare group. She's a Linux kernel hacker. Even among that group, she's unique - not because of her gender (though that probably is distinctive in many of the group's social gatherings), but because she's brave enough to demand civil behavior from the leaders of the community. I applaud her for that.

Now, I have immense respect for Linus Torvalds and his crew. I've been a direct beneficiary of Linux in both professional and personal contexts for soon to be two decades. The skills this group demonstrate are possibly only matched by the level of quality they demand from each other. However, unfortunately that bar is often demonstrated only on the technical level, while the tone of discussion, both in-person and on the mailing lists, can turn quite hostile at times. It's been documented many times, and I can bring no value to rehashing that.

However, I wanted share some experience from my own career as a developer, manager of developers, and someone who has been both described as demanding and who has needed to report to others under very demanding circumstances. I've made some of these mistakes myself, and hope to have learned from them.

Perhaps not so surprisingly, the same people in the community who defend hostile behavior also, almost by rule, misunderstand what it means to behave professionally. There's a huge difference between behaving as in a workplace where people are getting paid, and behaving professionally. The latter is about promoting behaviors which lead to results. If being a asshole was effective, I'd have no problem with it. But it's not.

To consistently deliver results, we need to be very demanding to ourselves and to others. Being an uncompromising bastard with regards to the results will always beat accepting inferior results, when we measure technical progress on the long run -- though sometimes experience tells us a compromise truly is "good enough".

However, that should never be confused with being a bastard in general. Much can (and should) be said about how being nice to others means we're all that much happier, but I have something else to offer. Quite simply: people don't like to be called idiots and having personal insults hurled at them. They don't respond well to those circumstances. Would you like it yourself? No? Don't expect anyone else to, either. It's not productive. It will not ensure delivering better results in the future.

Timely, frequent and demanding feedback is extremely valuable to results, to the development of an organization, and to the personal development of the individuals. But there are different types of communication, and not all of it is feedback. Demanding better results isn't feedback, it's setting and communicating objectives. Commenting on people's personalities, appearance, or otherwise, let alone demanding them to change themselves as a person isn't feedback nor reasonable. Feedback is about observing behavior and demanding changes in the behavior, because behavior leads to results. Every manager has to do it, never mind whether they're managing a salaried or voluntary team.

However, calling people names is bullying. Under all circumstances. While it can appear to produce results (such as, making someone withdraw from an interaction, thus "no longer exhibiting an undesirable behavior"), those results are temporary and come with costs that far outweigh the benefits. It drives away people who could have been valuable contributors. What's not productive isn't professional. Again - I'm not discussing here how to be a nicer person, but how to improve results. If hostility helped, I'd advocate for it, despite it not being nice.

The same argument can be said about using hostile language even when it's not directed at people, but to results. Some people are more sensitive than others, and if by not offending someone's sensibilities you get better overall results, it's worth changing that behavior. However, unlike "do not insult people", use of swearwords toward something else than people is a cultural issue. Some groups are fine with it, or indeed enjoy an occasional chance to hurl insults at inanimate objects or pieces of code. I'm fine with that. But nobody likes to be called ugly or stupid.

In the context of Linux kernel, does this matter? After all, it seems to have worked fine for 20 years, and has produced something the world relies on. Well, I ask you this: is it better to have people selected (or have the select themselves to) for Linux development by their technical skills and capability to work in organized fashion, or by those things PLUS an incredibly thick skin and capacity to take insults hurled at them without being intimidated? The team currently developing the system is of the latter kind. Would they be able to produce something even better, if that last requirement wasn't needed? Does that help the group become better? I would say hostility IS hurting the group.

Plus, it's setting a very visible, very bad precedent for all other open source teams, too. I've seen other projects wither and die because they've copied the "hostility is ok, it works for Linux" mentality, but losing out on the skills part of the equation. They didn't have to end that way, and it's a loss.

Monday 21 September 2009

A peek under the hood in Infobright 3.2 storage engine

I've been meaning to post some real-world data on the performance of the Infobright 3.2 release which happened a few weeks ago after an extended release candidate period. We're just preparing our upgrades now, so I don't have any performance notes over significant data sets or complicated queries to post quite yet.

To make up for that, I decided to address a particular annoyance of mine in the community edition, first because it hadn't been addressed in the 3.2 release (and really, I'm hoping doing this would include it into 3.2.1), and second, simply because the engine being open source means I can. I feel being OSS is one of Infobright's biggest strengths, in addition to being a pretty amazing piece of performance for such a simple, undemanding package in general, and not making use of that would be shame. Read on for details.

Continue reading...

Thursday 30 April 2009

The difference between conversion and retention

Picked up a piece of analysis today from my newsfeed regarding Twitter audience. Nielsen has posted information about Twitter's month-to-month retention (40%) and compared that to Facebook's and MySpace's. Pete Cashmore over at Mashable promptly misread the basic information and came to an entirely wrong conclusion about the stats, titling his post about it as "60% quit Twitter in the first month". A simple misunderstanding of basic audience analysis like this is the crucial difference between explosively growing traffic and a failure. That's a fail for you, Pete.

What's wrong? Well, retention is a separate matter from conversion. 40% conversion from a trial registration to being a continuing active user to the second month would not be a bad conversion rate. It's not stratospherically great, I've seen better, but I wouldn't be terribly unhappy about such a figure. However, Nielsen didn't say anything at all about first-to-second month conversion. This is what they DID say: "Twitter’s audience retention rate, or the percentage of a given month’s users who come back the following month, is currently about 40 percent."

That's pretty plain English when you take the time to read it. Month to month, regardless of visitor lifetime, not first to second month. On this metric, 40% retention is not good at all, and will definitely be a limiting factor to Twitter's traffic and audience size over time, just the Nielsen article points out (and shows the math for). For any given retention rate, there just is a certain maximum audience reach beyond which any new traffic can't overcome the leaving base, since new traffic is not an inexhaustible supply.

And since today is a busy day, that concludes the free startup advice. Take the time to understand the difference between these metrics, you'll thank yourself for it later.

Tuesday 28 April 2009

The MySQL community outlook

While I can not consider myself a member of MySQL's community of developers, I've been watching those developments the same way I follow the development of Linux and many of the Java and Apache projects our own services depend on. It was great to meet many of the core members of the development community and get some insight into their thoughts about the future.

Baron Schwartz called in his Percona Performance Conference keynote on Thursday for a new, active MySQL community to take the driver's seat in the development of the database, not just in the incremental improvements way of bug fixing and performance improvement, but also by setting a vision for the next generation MySQL. It's a call to action greatly needed, and an important one despite the active existence of the Drizzle project. This is because while Drizzle already has a vision for the future, it's a radical diversion for the MySQL userbase and one which will not necessarily have smooth upgrade path. Many of the same MySQL users feeling most of the pain of MySQL's current limitations are also those who will not be able to easily upgrade to a radically different architecture due to the amount of data and dependencies in their existing infrastructure.

It's a gap which needs a careful approach of incremental changes to the MySQL base functionality to help users bridge over to a new, brighter future. These changes do not need to be slow. Rapid incremental changes are likely to be easier to digest with a clear upgrade and downgrade path from iteration to iteration leaving the organizations with biggest infrastructures to consider a way to set their own pace through the transition, rather than being forced to take one huge leap and risk a crash to the concrete wall of unexpected incompatibility.

A few such pieces of incremental community improvements I learned a great deal of during the week were the performance and scalability improvements by Google and Percona and their MySQL 5.4 equivalents, the Xtrabackup utility not only as an alternative, but improvement on the Innobackup tool which has significant limitations to its use in large-scale deployments, and the Tungsten Replicator providing useful cross-database replication and rapid failover features helping upgrades and transitions to new database installations while minimizing downtime and impact to users. I'm also curious about the storage engine development by Primebase - I don't think there's ultimately a lot of room for multiple transactional storage engines, but as a competitive research topic, it's certainly good to see alternatives to InnoDB.

[Be sure to check out my earlier posts of the conference learnings as well!]

Monday 27 April 2009

Database innovation on MySQL

If MySQL's core server development and release process has been somewhat of a frustration to the userbase over the past few years, clearly another part of the ecosystem has thrived in ways which brought exciting fruit to the Expo part of this year's conference. MySQL has become a hub of innovation in both transactional and analytics databases in ways which have turned many of my concerns to enthusiasm.

I've already discussed the technologies for data analytics on MySQL, in particular Infobright's storage engine technology. This year I took the opportunity to learn a bit more about their appliance-based competitor Kickfire as well, and it certainly looks like a solid product. I still don't completely understand what the "SQL chip" in their appliance does, but certainly the combination of a special-purpose columnar storage, high-speed memory interface and high-performance indexing should form basis for a great analytics system. How it compares in practice to Infobright's software-only approach, time will tell. I'd be interested in real-world experiences, so if you have some to share, please get in touch. Finally, I missed the Calpont info myself, but once it is released, I'll try to get the time to try it out.

I'm even more excited about the new solutions on the transactional side of things. I've certainly been among the people frustrated by MySQL/InnoDB's scaling issues on modern hardware, and glad to see that the optimization work done by Google, Innobase and Percona is being accepted to the "mainline" MySQL Enterprise Server. However, what I did not expect to see were the solutions shown by Virident and Schooner for accelerated, Flash-based storage appliances. It's interesting how both of these companies have chosen to apply their platforms to accelerate both InnoDB and Memcached, and I'm looking forward to the chance to spend more time with both solutions. While both are Flash-based approaches, they seem to have taken very different architectural choices in the way they're exposing the memory to the software layer, and I'm curious to see the impact those choices have on both IO and storage capacity scaling. In any event, these are unique technologies unlike what I've seen for other platforms at this time. I need to learn how they plan to work with the community and Sun/Oracle in keeping the solutions functionally compatible with standard MySQL server.

The ecosystem doesn't end at the appliances, though. On the software side of things, I was pleasantly surprised by the state of Primebase's PBXT storage engine as well as Continuent's new Tungsten Replicator. While both are still early in their development path, they seem to hold a lot of promise for improving the performance of MySQL's built-in functionality in InnoDB as well as in the replication subsystem. Robert Hodges's demo of Tungsten's set-up and management also looked like it will greatly simplify replication administration, which is a big deal for anyone who has to manage 20+ replicated database systems. What's more, if Robert and his team crack the multi-threaded replication problem, and major scalability concern is lifted.

[Be sure to check out my earlier posts of the conference learnings as well!]

Wednesday 19 November 2008

Looking for a ETL engineer for our BI team

So, I mentioned earlier that I was looking at Infobright's Brighthouse technology as a storage backend for heaps and heaps of traffic and user data from Habbo. Turns out it works fine (now that it's in V3 and supports more of the SQL semantics), and we took it into use. Been pretty happy with that, and I expect to talk more about the challenge and our solution at the next MySQL Conference in April 2009.

However, our DWH team needs extra help. If you're interested in solving business analytics problems by processing lots of data and the idea of working in a company that leads the virtual worlds industry excites you, let us know by sending us an application. Thanks for reading!

Monday 15 September 2008

Infobright BI tools go open source

I've mentioned Infobright before as an interesting solution to getting more performance to BI analytics solutions. Today's news are interesting: Sun invests in the company, and the baseline product is open sourced. Too busy to write more about it today, but I'm certainly watching this one closely.

Monday 31 March 2008

Optimizing Linux for random I/O on hardware RAID

There's a relatively little known feature about Linux IO scheduling that has a pretty significant effect in large scale database deployments at least with MySQL that a recent article on MySQL Performance Blog prompted me to write about. This may have an effect on other databases and random I/O systems as well, but we've definitely seen it with MySQL 5.0 on RHEL 4 platform. I have not studied this on RHEL 5, and since the IO subsystem and the Completely Fair Queue scheduler that is default on RHEL kernels has received further tuning since, I can not say if it still exists.

Though I've heard YouTube discovered these same things, I have not yet seen a simple explanation of why this is so - so I'll take a shot at explaining it.

In short, a deployment with a RAID controller or external storage system visible to the operating system as a single block device will not reach its maximum performance under RHEL default settings, and can be easily coaxed about 20% higher on average random I/O (and significantly higher in spot benchmarks) with a single kernel parameter (elevator=noop) or equivalent runtime tuning via /sys/block/*/queue/scheduler in RHEL5, where you can also set this on a per-device basis.

We first saw this in 2005 on a quad-CPU server with a RAID controller connected to 10 SCSI disks. At that time, we found that configuring the RAID to expose five RAID-1 pairs which we then striped to a single volume using LVM increased performance despite making the OS and CPU do more work on I/O. The difference in performance was about 20%.

Our most recent proof of the same effect was a quad-CPU server connected to a NetApp storage system over FC. Since it was not convenient to expose multiple volumes from the NetApp to stripe them together, we searched for other solutions, and prompted by a presentation by the YouTube engineers looked at the I/O scheduling options and found a simple way to improve performance was to turn off I/O reordering by the kernel. Again, the overall impact between the settings was about 20%, though at times much greater.

The lesson is simple: reordering I/O requests multiple times provides no benefits, and reordering them too early will in fact be detrimental. Explaining why that is so is a bit involving, and is based on a few assumptions we have not bothered to verify, since the empirical results have supported our conclusions and got us where we wanted.

In order to keep the explanation simple, I will describe it conceptually on a very small scale. When reading this, please take this into account and understand that to measure the effect we have seen in practice, the size of the solution should be increased from what I am describing.

First, consider the case of direct-attached storage exposed to the Linux kernel as independent devices. In this configuration, the kernel maintains a per-device I/O queue, and the CFQ scheduler will reorder I/O requests to each device separately in order to maintain fair per-process balancing, low latency and high throughput. This is the configuration in which CFQ does a great job of maximizing performance, and works fairly well with any amount of spindles. As the application (a database in this case) fires random I/O, each of the spindles is executing them independently and serves requests as soon as they are issued. In other words, the system is good at keeping each of the I/O queues "hot". The sustained top I/O rate is roughly linear to the number of spindles, or with 15k rpm drives, about 1000 ops for four drives.

Now, lets introduce a hardware RAID of some sort, in particular one which is enabled to further reorder operations thanks to a "big" battery-backed up cache. Thanks to that cache, the RAID can commit thousands of write operations per second for fairly long periods (seconds), flushing them to disk after merging. On the other hand, the kernel now sees just one device, and has one I/O queue to it. The CFQ scheduler sits in front of this queue, reordering pending I/O requests. All is fine until the I/O pressure rises up to about what a single spindle can process on a sustained basis, or about 250 requests per second on those 15k drives. However, as soon as the queue starts building up, the CFQ scheduler kicks into action, and reorders the queue from random to sorted per block number (an oversimplification, but close enough).

All is good? No, it's not. The sequential blocks on that RAID volume are not truly sequential, but reside on different spindles and could thus be processed simultaneously. To demonstrate, lets assume your four-spindle array has one billion sectors or five hundred gigs per device, and further, that it is striped at 64k extents or 7.8 million stripes across each device.

On both configurations, the striping is essentially the same. Every 128 sectors or 64k is one one device, then the next one, and so on. The difference is that with LVM in place, the kernel knows this, while with the RAID, it has no idea of the layout of the array, essentially treating it as a single spindle.

Now, those couple of thousand request that were just issued, contain sequences such as writes to sectors 10, 200, 50, 300, 1020, 600, 1500 and 700. Due to the striping, four of these can be executed simultaneously, so the optimal order to issue these, of course depending on what else might be going on, is something like 10, 200, 300, 1500, 50, 700, 1020, and 600, executed through four queues: [10, 50, 600], [200, 700], [300] and [1020, 1500]. In the LVM configuration this might be what really happens. However, the single I/O queue to the RAID device will have these sorted into ascending block order, and with enough such operations in the queue, the RAID processor no longer has enough view to the queue to efficiently re-re-order them to utilize all the spindles, so only some of them are hot at any given time. TCQ should help, but in practice it won't issue enough outstanding requests to fix the problem. In our experience the top sustained rate is not more than 1.5 times one spindle, or 300-400 requests per second, while the array should really run at over the 1000 ops per second thanks to the additional persistent cache on the RAID controller.

Bottom line: CFQ is great, but only if the kernel actually knows everything about the physical layout of the media. It also looks like some of the recently introduced tuning parameters (which I know nothing about, just noted their appearance) might help avoid the worst hit. However, ultimately it doesn't matter - if your hardware allows efficient "outsourcing" of the I/O scheduling to a large secure cache, use it, and don't bother making the kernel do the job without all the information.

I hope this explanation makes sense, and that I haven't botched any important details or made wrong assumptions. Please comment if any of this is inaccurate.

PS. A tuning guide for Oracle recommends the deadline scheduler due to latency guarantees. We have not benchmarked that against noop.

Sunday 7 October 2007

MySQL and materialized views

I'm working on alternative strategies to make the use and maintenance of a multi-terabyte data warehouse implementation tolerably fast. For example, it's clear that a reporting query on a 275-million row table is not going to be fun by anyone's definition, but that for most purposes, it can be pre-processed to various aggregated tables of significantly smaller sizes.

However, what is not obvious is what would be the best strategy for creating those tables. I'm working with MySQL 5.0 and Business Objects' Data Integrator XI, so I have a couple of options.

I can just CREATE TABLE ... SELECT ... to see how things work out. This approach is simple to try, but essentially unmaintanable; no good.

I can define the process as a BODI data flow. This is good in many respects, as it creates a documented flow of how the aggregates are updated, is fairly easy to hook up to the workflows which pull in new data from source systems, and allows monitoring of the update processes. However, it's also quite work intensive to create all those objects with the "easy" GUIs in comparison to just writing a a few simple SQL statements. There are also some SQL constructs that are horribly complicated to express in BODI; in particular, COUNT(DISTINCT ..) is ugly.

Or I could create the whole process with views on the original fact table, with triggered updates of a materialized view table in the database. It would still be fairly nicely documentable, thanks to the straightforward structure of the views, and very maintanable, as the updates would be automatic. A deferred update mechanism with a trigger keeping track of which part of the materialized view needs update and a periodic refresh over a stored procedure would keep things nicely in sync. MySQL 5.0 even has all of the necessary functionality.

Except.. It's only there in theory. The performance of views and triggers is so horrible that any such implementation would totally destroy the usability of the system. MySQL's views only work as statement merge when there is a one-to-one relationship between base table and view rows, or in other words, the view can not contain SUM(), AVG(), COUNT() or any of the other mechanisms which would have been the whole point of the materialized view in question. It will fall back to a temp table implementation in these cases, and creating a GROUP BY temp table over 275 million rows without using the WHERE BY clause is pure madness.

In addition, defining any triggers, however simple, slow bulk loads to the base tables by an order of magnitude. I could of course still work around triggers by implementing the equivalent logging in each BODI workflow and create the materialized views and a custom stored proc to update each one, but having a view there in between was the only way to make this approach maintainable. Damn, there goes that strategy.

Wednesday 29 August 2007

My Top 5 wishlist for MySQL

I (belatedly) noticed a meme running on Planet MySQL regarding wishlist items for the company. I think it started with Jay Pipes and Mårten Mickos, but has since moved on to users. In particular, I'd endorse most of Jeremy Cole's and Ronald Bradford's wishes myself as well.

But let me jump on the bandwagon and offer my view of the things that would most help us run and develop our services.

1. Online table changes. Ronald mentioned this as well, but I have to emphasize this more: for all the good that InnoDB did in terms of eliminating table-level locks for INSERTs and UPDATEs, it has caused almost more pain for us in terms of locks during schema updates. Until you've tried it yourself, you can't imagine the pain of running an ALTER TABLE ADD INDEX on a 150-million row table during a routine application upgrade that would otherwise be over in 10 minutes...

2. Reliable baseline functionality, including replication. This is the big one: MySQL 5.0 improved many things, but one thing it didn't improve is dependability – version 4.1 was far more stable. Maybe we're pushing the boundaries, but when you're managing 20+ database servers and several terabytes of data, what you don't want to start your days with is a check of which slaves have stopped updating and which processes you need to restart and scheduled maintenance jobs to run again.

3. A smarter query planner. In addition to our normal application OLTP-style development, we're currently busy working on a multi-terabyte DWH project. MySQL happily proceeds to execute three-table JOIN queries sequentially scanning 200 million row facts when it can't quite figure out whether a query selects 20 or 200 rows from a dimension table. Oops. Please come back after lunch...

4. Index assistance. While most of Microsoft's wizards are not very useful, SQL Server has for years had one that is really nice; one that captures all queries against a database and evaluates whether new indexes would improve performance, and whether existing indexes are helping. It's really cumbersome to do this with the slow_query_log and analysing EXPLAIN output.. Especially since that output isn't all that detailed (see previous)...

5. Runtime-changeable InnoDB and logging parameters. Sometimes finding the root of a performance issue is a hit-and-miss job of looking for slow queries, parallel updates, buffer pool settings, etc parameters, but many of the settings you need to try require a restart to take effect. Not only does that make it impossible to even contemplate fixing something in a live installation, running benchmarks in a test deployment is also a slow and cumbersome undertaking.

Some earlier time I might have mentioned on this list also integrated hot backups – today it will only make it as a runner up. Not that the standard tools are any better, but because we've developed a mechanism that works on top of storage system snapshots, we can deal with it anyway. No, mysqldump really doesn't do the trick, thanks. Not when you're talking of databases in the range of hundreds of gigs. Another one that I might have wanted to mention was partitioned tablespaces, but I guess when 5.1 eventually becomes GA, I get to offer an opinion on that...

- page 1 of 2