Faster WF Still
October 21st, 2007 | Published in erlang, performance | 5 Comments | Bookmark on Pinboard.in
OK, so you could say that I’m a bit obsessed with Tim Bray’s Wide Finder project. Just a little. I mean, I started this blog just a few weeks ago and so far almost every posting has been about it:
- Tim Bray and Erlang
- More File Processing with Erlang
- One More Erlang Wide Finder
- OK, Just One More WF
There’s also my Sept./Oct. Internet Computing column, Concurrency with Erlang, and sometime in early November, my next column, entitled Reliability with Erlang, will be published. Neither column is connected at all to the Wide Finder, but they just further reveal my current obsession with Erlang.
In my previous post I described my fastest solution up to that point, but here’s an even faster one: tbray16.erl
, which is identical in every way to tbray15.erl
from my previous post except that it uses wfbm4.erl
, which provides all the performance gains. This version of Boyer-Moore searching includes two simple tweaks:
- Uses hard-coded constants for string lengths, since the strings are fixed, rather than constantly recalculating them with
length/1
. - Fixes a nagging problem with my Boyer-Moore implementation where it wasn’t handling repeated characters in the fixed pattern very well. What I did in the previous version was choose the lesser of the two shifts if both characters appeared in the pattern, which worked but isn’t technically correct, and it also meant two
dict
lookups to get the shift values rather than just one. Now, I do the right thing: just keep track of the number of comparisons and subtract that from the shift value, use that if the result is positive and non-zero, otherwise just shift by 1.
This version shaves another whole second off the previous version for about a 25% speedup. The fastest I’ve seen it run on my 8-core Linux box is:
real 0m3.107s
user 0m16.243s
sys 0m2.134s
Meanwhile, Anders Nygren has been exploring eliminating using the dict
altogether for the shift value lookup, but nothing I’ve tried there has been an improvement. But thanks to Anders for prompting me to properly fix that Boyer-Moore code and at least eliminate one of the dict
lookups.
October 22nd, 2007 at 9:20 pm (#)
Where can I find your most recent code? In the Erlang mailing list you claim to have obtained the following results:
real 0m1.904s
user 0m7.917s
sys 0m1.185s
Very exciting stuff.
October 22nd, 2007 at 10:09 pm (#)
Hi Fredrik, yes, that’s a real result. Some of the code isn’t mine, though, so I’ve contacted its author to see if it’s OK if I post it here. Hopefully he’ll agree to that!
October 23rd, 2007 at 11:25 am (#)
taking about wide finder project: you as as many other experts, say something like this: “be pragmatic and you must use the right tool for the right problem”, here erlang is not the right tool for this kind of problem, I love erlang but I have to admit that this is not his terriory, I love C++ and I know this is his territoty: a program behind the scenes that needs the best performance possible, and because it will be short it’ll be not too complex;
So why waste the time improving something that cannot be improve or if it is done, the whole effort break the benefit: easy and quick development;
what you are doing with all this is put dirt on erlang with no reason at all, is like Im saying: C++ is useless because I cant load dynamic classes on it, C++ is not meant to do that, it has other responsabilties, so erlang has other responsabilities.
October 23rd, 2007 at 2:23 pm (#)
Miguel: keep in mind Tim Bray’s original goal for Wide Finder, which was to make the best possible use of the many cores available on his high-end machine. If the exercise were to simply tear through a text file as quickly as possible, count occurrences of the 7th field, and then display the top ten, then yes, sticking with Python, Perl, or Ruby would be simple, and in fact would be boring.
Given that we now have an Erlang solution that really isn’t all that complicated, runs in 1.9s on an 8-core system, and can go even faster on a 64-core box, I’d say the efforts that I and a few others have put into this have been well worth it. Perhaps when Tim publishes his final results, you’ll reconsider your opinion.
I use multiple languages every single day, am a big fan of numerous languages, and I appreciate their various strengths and weaknesses. But even so, what’s wrong with experimenting with them? Consider my efforts as my own way of learning more about Erlang and helping to promote a language that is quickly becoming one of my all-time favorites. Erlang is not a large language, yet it’s provided numerous avenues for solving this problem, while other languages quickly ran out of steam. In the end, a language that many claim is not suited to this problem actually turns out to solve it extremely well according to Tim’s original goals. Yet if we had started out listening to those claims, nobody would have bothered discovering that Erlang is a good fit here after all.
And finally, contrary to what you claim, I know for a fact that my own interest in Erlang and what I’ve written about it has gotten others interested as well. It’s all been positive.
October 23rd, 2007 at 8:09 pm (#)
I didn’t mean “you” as a one person in particular, I did mean “you” in plural(all the people who are involve in all this of WF) you must know that English is not my first language and I certainly slip my fingers…I think; but thanks for the reply, you know sometimes we got tired of this business and we didn’t realize that we are not moving forward, we always must go on(and I always try to do it), and a conversations like this is like a slap in your face and says: “ehhy man, wake up you are getting old with that attitude” anyway I love erlang and I enjoy programing on it and nothing is gonna changes that, I find it very sound, very mature or maybe is simply because I like it.. you know talking about erlang: the book of joe armstrong, is interesting for several reasons, but one thing that is important for me and nobody seems to notice, is that despite is about the language per se, is a good book of functional programming(at least the first part and although is not extensive one), is the only book of functional style that tells me exactly what a functional style is in few bunch of pages,(maybe I didn’t read too much) and since then I really started to appreciate that style(although I know is not the silver bullet)