Tim Bray and Erlang

September 23rd, 2007  |  Published in erlang  |  14 Comments  |  Bookmark on Pinboard.in

Tim’s playing with Erlang, trying to rewrite a Ruby program for analyzing his website logs. His initial reaction appeared to be one of disgust, since his Erlang program was an order of magnitude slower than the Ruby version. Thankfully, though, a bunch of people have since jumped in and made the solution more palatable (check the comments on his posting).

Reading between the lines, it seems that Tim was hoping to take advantage of Erlang’s concurrency to put his multicore machines to work analyzing his logs. Unfortunately, none of the answers posted in the blog comments seem to provide that, so I decided to take a crack at it myself.

First I wrote a tiny little Erlang program to read in Tim’s entire sample logfile using file:read_file, turn the result into a list of strings using binary_to_list, and then process the strings. Simple timings showed the binary_to_list call to be the slowest part, so I decided to throw Erlang’s multiprocess capability at it. If you’re familiar with pthreads or Java threads, think of an Erlang “process” as basically a very, very lightweight thread.

Here’s my solution:

-module(tbray).
-export([start/2]).

find_match("/ongoing/When/" ++ Last) ->
    case lists:member($., Last) of
        false -> 1;
        true -> 0
    end;
find_match(_) -> 0.

process_binary(Pid, Bin) ->
    spawn(fun() ->
        L = string:tokens(binary_to_list(Bin), "\\n"),
        V = lists:foldl(
            fun(Line, Total) ->
                Total + find_match(lists:nth(7, string:tokens(Line, " "))) end,
            0, L),
        Pid ! V
        end).

split_on_newline(Bin, N, All) when size(Bin) < N ->
    All ++ [Bin];
split_on_newline(Bin, N, All) ->
    {_, <<C:8, _/binary>>} = split_binary(Bin, N),
    case C of
        $\\n ->
          {B21, B22} = split_binary(Bin, N+1),
          split_on_newline(B22, N, All ++ [B21]);
        _ -> split_on_newline(Bin, N+1, All)
    end.
split_on_newline(Bin, N) when N == size(Bin) -> [Bin];
split_on_newline(Bin, N) -> split_on_newline(Bin, N, []).

start(Num, Input) ->
    {ok, Data} = file:read_file(Input),
    Bins = split_on_newline(Data, size(Data) div Num),
    Me = self(),
    Pids = [process_binary(Me, B) || B <- Bins],
    lists:foldl(
        fun(_, Total) -> receive X -> Total + X end end,
        0, Pids).

The way this solution works is that it uses multiple Erlang processes to convert chunks of the input file to lists of strings and process them for matches. Begin with the start/2 function at the very bottom. First, we read the file in one shot, then split it into Num chunks, with the split_on_newline function variants being mindful to end each chunk on a newline character so we don’t split lines across chunks. We then pass each chunk to the process_binary/2 function using a list comprehension. Each process_binary/2 call spawns a new process to first convert its chunk to a list of strings and then process those strings for matches.

Now let’s time it. My MacBook Pro has two cores, so let’s enable SMP, and bump the Erlang process limit up to 60,000. First, we’ll compile the module and time it with just a single process:

$ erl -smp enable +P 60000
1> c(tbray).
{ok,tbray}
2> timer:tc(tbray, start, [1, "o10k.ap"]).
{661587,1101}

OK, at 0.66 seconds, we’re already a lot faster than Tim’s approach (the second value, 1101, is the number of matches we found), but can it go faster? Let’s try 2 processes:

3> timer:tc(tbray, start, [2, "o10k.ap"]).
{472067,1101}

That dropped us to 0.47 seconds, which is not an insignificant speedup. Do more processes help?

4> timer:tc(tbray, start, [4, "o10k.ap"]).
{390786,1101}

Yes, at 4 processes we drop to 0.39 seconds. Let’s go up a few orders of magnitude:

5> timer:tc(tbray, start, [40, "o10k.ap"]).
{380753,1101}
6> timer:tc(tbray, start, [400, "o10k.ap"]).
{322979,1101}
7> timer:tc(tbray, start, [4000, "o10k.ap"]).
{316857,1101}
8> timer:tc(tbray, start, [40000, "o10k.ap"]).
{318153,1101}

As we increase the number of Erlang processes, our performance improves, up to a point. At 40,000 processes we’re slower than we were at 4000. Maybe there’s a better number in between? It turns out that despite the numbers listed above, once you get above 400 processes or so, the numbers remain about the same. The best I got on my MacBook Pro after numerous runs was 0.301 seconds with 2400 processes, but the average best seems to be about 0.318 seconds. The performance of this approach comes pretty close to other solutions that rely on external non-Erlang assistance, at least for Tim’s sample dataset on this machine.

I also tried it on an 8-core (2 Intel Xeon E5345 CPUs) 64-bit Dell box running Linux, and it clocked in at 0.126 seconds with 2400 processes, and I saw a 0.124 seconds with 1200 processes. I believe this utilization of multiple cores was exactly what Tim was looking for.

If you’re a Java or C++ programmer, note the ease with which we can spawn Erlang processes and have them communicate, and note how quickly we can launch thousands of processes. This is what Tim was after, I believe, so hopefully my example provides food for thought in that area. BTW, I’m no Erlang expert, so if anyone wants to suggest improvements to what I’ve written, please feel free to comment here.

Responses

  1. Kevin H says:

    September 24th, 2007 at 8:30 am (#)

    Did you concat Tim’s download logfile onto itself 100x?

    Several people commenting on Tim’s blog seemed to miss this point as well. The dataset Tim was testing with was 1,167,948 lines of data. The download he provided was 10,000 lines of data.

    The Ruby program could process 1,167,948 lines of data in 3.47 seconds. If your fastest time on 10,000 lines of data was 0.124 seconds, and if the slowdown is linear with an increase of input data, then your erlang program is looking to be ~3x slower than Ruby.

  2. steve says:

    September 24th, 2007 at 9:44 am (#)

    Hi Kevin, no, I’m not missing that point at all. My purpose for this code was to show one possible way that Erlang’s SMP capability might be used to gain performance, not to claim victory. When you throw in the full-size dataset, the regular expression matching, and the sorting, all the Erlang solutions I’ve seen so far, including my own, still fall well short of the Ruby version. I’m also a Ruby fan, so that doesn’t bother me. :-)

    One problem with my approach is that it struggles, at least on my laptop, when trying to read in the full dataset file at once. Perhaps if you combined my code with Bob Ippolito’s “cat port” solution, you’d avoid having to read the whole file at once, and thus be able to utilize multiple cores even for the large dataset. I’ll have to try that.

  3. John Davies says:

    September 24th, 2007 at 5:17 pm (#)

    Hi Steve, it’s interesting that Erlang is starting to gain exposure in the banking world. Ruby has had the side stage for some time now but never really made it into production. Erlang however seems to have done a little better in what seems like a shorter time.

    I’ve heard that Erlang is slow but I don’t think that’s an issue, surely the advantage of a DSL is it’s efficiency and elegance in solving domain problems. I don’t think the task above is really Erlang’s domain but it’s great to see what you’ve done with it in this case. How about trying a version in PERL.

    -John-

  4. Victor says:

    September 25th, 2007 at 2:47 am (#)

    If raw performance is that important, I think that it is also worth to try MLTon + ConcurrentML or OCaml+ either CamlP3l or JoCaml.

    Nevertheless an interesting problem to try to solve it…

  5. Bryan O'Sullivan says:

    September 25th, 2007 at 7:09 pm (#)

    I wrote an alternative implementation in Haskell, and it can process 1 line per microsecond with two cores: http://www.serpentine.com/blog/2007/09/25/what-the-heck-is-a-wide-finder-anyway/

  6. Tim says:

    September 26th, 2007 at 12:11 am (#)

    Steve, these two lines are eating my brain:

    split_on_newline(Bin, N) when N == size(Bin) -> [Bin];

    split_on_newline(Bin, N) -> split_on_newline(Bin, N, []).

    When I call

    Bins = split_on_newline(Data, size(Data) div Num),

    suppose Num is 1. Don’t I just get the data back? What am I missing?

  7. Pete Kirkham says:

    September 26th, 2007 at 9:11 am (#)

    $ grep “GET /ongoing/When/” o10k.ap | wc -l
    1853

    You may be missing matches that span the chunks you’re splitting into. POITROAL

  8. steve says:

    September 26th, 2007 at 9:21 am (#)

    Hi Tim, you are correct — if Num is 1, you get the Data back, but as a list of one element so that it works in the list comprehension two lines below, where process_binary is called. When Num is greater than 1, the second of the two lines you mention just starts the whole recursion on split_on_newline/3, initializing the list accumulator to the empty list [] and passing it as the third argument. split_on_newline/3 then proceeds to fill in that list until it gets to the final part when size(Bin) < N, which just tacks [Bin] onto the end of All and returns the whole list of binaries, ready to be processed.

  9. steve says:

    September 26th, 2007 at 10:04 am (#)

    Hi Pete, no, there are no matches being missed that span chunks. The whole point of the split_on_newlines functions is to ensure that all chunks break only on newlines, just as in Tim’s original Ruby code.

  10. Pete Kirkham says:

    September 26th, 2007 at 10:36 am (#)

    Sorry, I thought it was performing a similar function to the other code people had posted – either just counting matches, or counting occurances of each URL request as per the original code.

    case lists:member($., Last) of
    false -> 1;
    true -> 0
    end;

    so you’re only counting unique URLs, and not reporting on the number of requests to the most popular URLs?

  11. steve says:

    September 26th, 2007 at 10:48 am (#)

    Hi Pete, that find_match code came directly from Tim’s original code at <http://www.tbray.org/ongoing/When/200x/2007/09/22/Erlang>. See his process_match function.

  12. Pete Kirkham says:

    September 26th, 2007 at 10:54 am (#)

    Ahh, worked that’s the [^ .] at the end of the regex. Sorry for being a bit dense – first time trying to read erlang, and assumed from perl it $. was a variable not a character.

  13. Bill Burke says:

    September 26th, 2007 at 4:40 pm (#)

    Thank you Doug Lea for keeping Java relevant, but still boring ;-)

  14. JavaSPEKTRUM Blogosphäre » Blog Archiv » Blogosphäre (aus JavaSPEKTRUM 06/07) says:

    October 16th, 2007 at 7:16 am (#)

    […] liefert eine Implementierung in C++, Microsoft-Guru und SOAP-Vater Don Box kontert mit C#. Die erste Erlang-Lösung von CORBA-Guru Steve Vinoski funktioniert am besten mit 2400 (!) Erlang-Prozessen, liest allerdings […]