More Erlang Web Server Benchmarking

May 18th, 2011  |  Published in erlang, performance, testing, web  |  2 Comments  |  Bookmark on

In my previous blog entry I questioned the value of most web server benchmarking, particularly as related to Erlang. Typical benchmarks are misleading, inaccurate, and poorly executed. Perhaps worse, the intent of publishing them seems to be to assert that the fastest web server (at least according to the tests performed) is of course also the best web server. You’d think the flaws of this fallacy would be so obvious that nobody would fall for it, but think again: watching the delicious “erlang” tag over the past few days revealed the benchmarks my blog post referred to to be one of the most bookmarked Erlang-related pages during that timeframe.

Not surprisingly, though, it looks like I’m not the only one bothered by poor benchmarking practices. Over on his blog, Mark Nottingham just published a brilliant set of rules for HTTP load testing. It’s quite instructive to take your favorite set of published web server benchmarks and see just how many of Mark’s rules they violate.

Like I hinted last time, if you want benchmarks, you are best off by far if you run them yourself. That way, their relevance to the problems you’re addressing will be much more likely, and you can run them in a similar, or even the same, environment on which you plan to deploy. You can also gear the benchmarks to much more closely resemble your applications and the loads you require them to handle. Doing the benchmarking work yourself will give you valuable hands-on experience with the servers and frameworks you’re considering, allowing you to get a feel for important factors such as feature completeness and correctness, ease of development, flexibility, and ease of deployment and runtime management/monitoring, none of which can be gauged by someone else’s performance benchmarks. Finally, by doing your own benchmarking you can also help ensure the validity and usefulness of your results by following Mark’s load testing rules.


  1. DeadZen says:

    May 19th, 2011 at 7:41 am (#)

    How many rules were broken?

  2. steve says:

    May 19th, 2011 at 8:18 am (#)

    @DeadZen: I don’t know the answer to that question, as those benchmarks were created before I started using Erlang or Yaws. I don’t know any more details of how they were created or run than what’s described on that page. If you’ve ever heard any of my presentations where I mention these benchmarks, you’ll already know that I tend to classify them as interesting but no longer relevant. You’ll also find that the Yaws website does not refer to these benchmarks, and in fact displays no benchmarks of any kind.

    When I first started evaluating Yaws I found that page and tried to recreate the results but on a smaller scale, and I was able to get results that were somewhat similar but definitely not the same. But the important point about that is that rather than just blindly accepting those benchmarks, I ran my own benchmarks that tested the characteristics I sought.

    If my past two blog postings have not been enough to convey my opinion of performance benchmarks, then you might want to read this article (pdf) I wrote 8 years ago. That article applied to enterprise middleware, but it’s relevant to this discussion as well.