Sunday, March 29, 2009

On Benchmarking

Sigh. It must be that time of year again. Another partially-completed Ruby implementation has started to get overhyped because of early performance numbers.


MacRuby has been mentioned on this blog before. It's a reimplementation of Ruby 1.9 targeting the Objective-C runtime--and now, targeting LLVM for immediately compiling Ruby code to native code. Initial performance results running some of my benchmark show an interesting mixed bag. For some, MacRuby's new "experimental" branch performs very well, in some cases a few times faster than JRuby. For others, performance is slow enough there must be something wrong. And there's a large number of my benchmarks that don't even run, due to broken features they'll be fixing over the next several months.

And yet, at least one Rubyist has already seen fit to declare MacRuby "the fastest Ruby implementation around". Really? When it's crashing for about half the scripts I ran and extremely slow for many others?

He bases this assertion on running the benchmarks MacRuby includes in its own repository. Because MacRuby usually performs much better on those benchmarks than Ruby 1.9, he has decided they're now "the fastest Ruby". Do we have to do this hype dance every year?

Look, I know I'm biased. I want JRuby to be the best Ruby implementation possible. I want it to be fast, and if possible, the fastest. I also want it to run existing Ruby applications and integrate well with Java libraries and applications and continue to be one of the best choices for running Ruby. So I can understand that it sounds like I'm throwing stones by pouring water on such a breathless proclamation as "fastest Ruby implementation around". But seriously guys...haven't we learned anything?

MacRuby's experimental branch is just that: experimental. Lots of stuff is fast, but lots of stuff is broken or slow. I'm sure the MacRuby guys are going to get everything resolved and working, and I'll admit these early results drive me to work on JRuby performance even harder. But I also know from experience that many of the missing features are exactly those that make Ruby performance a really difficult problem. That's why we've always focused on compatibility first (almost to a fault); it's really easy to paint yourself into a corner.

But this post isn't about MacRuby. They're doing awesome work, and I have no doubt at least some of the performance numbers will stick. This post is about the evils of benchmarking, especially prematurely.

Around this time last year, MagLev (Ruby based on the Gemstone VM) posted some crazy benchmarks and shocked the Ruby world at RailsConf. They had numbers even more stunning than MacRuby, running some simple numerical benchmarks orders of magnitude faster than either Ruby 1.8 or 1.9. Several Ruby bloggers immediately posted not just their enthusiasm, but their belief that MagLev had won the performance battle without ever firing a shot.

And I believe it was a great disservice to the MagLev team.

MagLev was, last spring, a very primitive and early implementation. It could run some useful Ruby code, but the majority of the core classes had not yet been implemented and very little work had been done on compatibility. Now we're approaching a year later, and MagLev is still in development, still closed source, still at a private alpha stage of life. Again, I'll admit I'm biased, so I need to state that I believe MagLev is also a really cool technology, at least as cool as MacRuby or JRuby. In many ways and for many domains both of them are going to be more compelling than JRuby, and I have no illusions that JRuby will never get leapfrogged in performance. But we need to remember a really important fact: these implementations are not done.

I could post blog entries with every experimental branch of JRuby I've ever tested. I could show you "fib" numbers 3-5 times faster than current JRuby and 10 times faster than Ruby 1.9. But honestly, what would be the point? I know it's experimental, I know we need to get there in a careful, measured way, and I know that my best experiments may never be reflected in real-world, real-application performance. And yet it seems like people just love to latch on to these early contenders, hyping them to death almost before they're out of the starting gate.

Listen, people: Ruby is hard to implement. Oh, it may look easy at a glance, and you can probably get 70, 80, or even 90% of the way pretty quickly. But there's some crazy stuff in that last 10% or 5% that totally blindsides you if you're not looking for it. An early Ruby implementation has not run that last mile of Ruby implementation, and it takes almost as much work to get there as it does to run the first 90%.

So let's try to be adults about this and give new implementations time to actually finish before we whip the community into a frenzy. Every time we go overboard in our declarations, we look like amateurs. And as certain as I am that MacRuby is going to be a major contender for the "fastest Ruby" crown, I think we'd be wise to hold judgment until it and other young Ruby implementations are actually finished.