Faster WF Still

October 21st, 2007  |  Published in erlang, performance  |  5 Comments  |  Bookmark on Pinboard.in

OK, so you could say that I’m a bit obsessed with Tim Bray’s Wide Finder project. Just a little. I mean, I started this blog just a few weeks ago and so far almost every posting has been about it:

There’s also my Sept./Oct. Internet Computing column, Concurrency with Erlang, and sometime in early November, my next column, entitled Reliability with Erlang, will be published. Neither column is connected at all to the Wide Finder, but they just further reveal my current obsession with Erlang.

In my previous post I described my fastest solution up to that point, but here’s an even faster one: tbray16.erl, which is identical in every way to tbray15.erl from my previous post except that it uses wfbm4.erl, which provides all the performance gains. This version of Boyer-Moore searching includes two simple tweaks:

  • Uses hard-coded constants for string lengths, since the strings are fixed, rather than constantly recalculating them with length/1.
  • Fixes a nagging problem with my Boyer-Moore implementation where it wasn’t handling repeated characters in the fixed pattern very well. What I did in the previous version was choose the lesser of the two shifts if both characters appeared in the pattern, which worked but isn’t technically correct, and it also meant two dict lookups to get the shift values rather than just one. Now, I do the right thing: just keep track of the number of comparisons and subtract that from the shift value, use that if the result is positive and non-zero, otherwise just shift by 1.

This version shaves another whole second off the previous version for about a 25% speedup. The fastest I’ve seen it run on my 8-core Linux box is:

real    0m3.107s
user    0m16.243s
sys     0m2.134s

Meanwhile, Anders Nygren has been exploring eliminating using the dict altogether for the shift value lookup, but nothing I’ve tried there has been an improvement. But thanks to Anders for prompting me to properly fix that Boyer-Moore code and at least eliminate one of the dict lookups.

Responses

  1. Fredrik says:

    October 22nd, 2007 at 9:20 pm (#)

    Where can I find your most recent code? In the Erlang mailing list you claim to have obtained the following results:

    real 0m1.904s
    user 0m7.917s
    sys 0m1.185s

    Very exciting stuff.

  2. steve says:

    October 22nd, 2007 at 10:09 pm (#)

    Hi Fredrik, yes, that’s a real result. Some of the code isn’t mine, though, so I’ve contacted its author to see if it’s OK if I post it here. Hopefully he’ll agree to that!

  3. miguel rodriguez says:

    October 23rd, 2007 at 11:25 am (#)

    taking about wide finder project: you as as many other experts, say something like this: “be pragmatic and you must use the right tool for the right problem”, here erlang is not the right tool for this kind of problem, I love erlang but I have to admit that this is not his terriory, I love C++ and I know this is his territoty: a program behind the scenes that needs the best performance possible, and because it will be short it’ll be not too complex;

    So why waste the time improving something that cannot be improve or if it is done, the whole effort break the benefit: easy and quick development;

    what you are doing with all this is put dirt on erlang with no reason at all, is like Im saying: C++ is useless because I cant load dynamic classes on it, C++ is not meant to do that, it has other responsabilties, so erlang has other responsabilities.

  4. steve says:

    October 23rd, 2007 at 2:23 pm (#)

    Miguel: keep in mind Tim Bray’s original goal for Wide Finder, which was to make the best possible use of the many cores available on his high-end machine. If the exercise were to simply tear through a text file as quickly as possible, count occurrences of the 7th field, and then display the top ten, then yes, sticking with Python, Perl, or Ruby would be simple, and in fact would be boring.

    Given that we now have an Erlang solution that really isn’t all that complicated, runs in 1.9s on an 8-core system, and can go even faster on a 64-core box, I’d say the efforts that I and a few others have put into this have been well worth it. Perhaps when Tim publishes his final results, you’ll reconsider your opinion.

    I use multiple languages every single day, am a big fan of numerous languages, and I appreciate their various strengths and weaknesses. But even so, what’s wrong with experimenting with them? Consider my efforts as my own way of learning more about Erlang and helping to promote a language that is quickly becoming one of my all-time favorites. Erlang is not a large language, yet it’s provided numerous avenues for solving this problem, while other languages quickly ran out of steam. In the end, a language that many claim is not suited to this problem actually turns out to solve it extremely well according to Tim’s original goals. Yet if we had started out listening to those claims, nobody would have bothered discovering that Erlang is a good fit here after all.

    And finally, contrary to what you claim, I know for a fact that my own interest in Erlang and what I’ve written about it has gotten others interested as well. It’s all been positive.

  5. miguel rodriguez says:

    October 23rd, 2007 at 8:09 pm (#)

    I didn’t mean “you” as a one person in particular, I did mean “you” in plural(all the people who are involve in all this of WF) you must know that English is not my first language and I certainly slip my fingers…I think; but thanks for the reply, you know sometimes we got tired of this business and we didn’t realize that we are not moving forward, we always must go on(and I always try to do it), and a conversations like this is like a slap in your face and says: “ehhy man, wake up you are getting old with that attitude” anyway I love erlang and I enjoy programing on it and nothing is gonna changes that, I find it very sound, very mature or maybe is simply because I like it.. you know talking about erlang: the book of joe armstrong, is interesting for several reasons, but one thing that is important for me and nobody seems to notice, is that despite is about the language per se, is a good book of functional programming(at least the first part and although is not extensive one), is the only book of functional style that tells me exactly what a functional style is in few bunch of pages,(maybe I didn’t read too much) and since then I really started to appreciate that style(although I know is not the silver bullet)