Comments on: More File Processing with Erlang

By: Hypothetical Labs » Worse Is Better Scaling

Hypothetical Labs » Worse Is Better Scaling — Thu, 11 Oct 2007 14:54:19 +0000

[…] It’s spawned a couple of interesting threads on erlang-questions and several insightful blog […]

By: Hynek (Pichi) Vychodil

Hynek (Pichi) Vychodil — Mon, 08 Oct 2007 06:38:02 +0000

Steve: why bfile should be faster than file?
http://pichis-blog.blogspot.com/2007/10/is-bfile-faster-than-old-erlang-file.html

By: Dilip

Dilip — Thu, 04 Oct 2007 21:03:44 +0000

FWIW Joe Cheng has a C# 3.0 version[1] of this problem that seems to perform better than Ruby (even in terms of code brevity!). Of course it still needs PLINQ to make use of multi core/CPU hardware.

[1] http://jcheng.wordpress.com/2007/10/02/wide-finder-with-linq/#more-240

By: steve

steve — Wed, 03 Oct 2007 21:09:22 +0000

Hi Pete, yes, MacBook Pro disk I/O throughput is best case about 45 MB/sec from some figures I’ve seen, which I think would put us in the 4-5 sec range for reading this data, best case. If I run the Ruby solution on the full dataset, the first time it takes 7.5-8 secs, but the second time it takes 2.2-2.5 secs, which I believe shows the caching effects. But if we’re getting cached data, then I think that reinforces my point about the Erlang code not being I/O bound.

Your second paragraph above has some very good insights, and your final paragraph is right on the money, IMO. I’d be really interested in seeing your benchmark results (I assume they’ll be on your website?), and needless to say Tim’s T2 results should be quite interesting as well.

By: Pete Kirkham

Pete Kirkham — Wed, 03 Oct 2007 19:59:48 +0000

I do think on that particular measure you’re reading from cache rather than physical disk; on my laptop it takes 8.5s if you clear the cache, 0.4s if you don’t. Most laptop drives are around 30MBps; a bit more if 10,000 rpm. Flash drives are twice that, but excel at seek time.

However, that sort of number may actually be representative of the target system – if you assume a transfer rate in the 2GBps ballpark (Sun’s Thumper gives 2GBps to memory), and the T2’s 1.4GHz core, that gives only 0.7 CPU cycles (per hardware thread) to process each byte of data, so even the C++ code (which takes around 3 cycles per byte on my laptop) would be CPU limited rather than IO limited if you don’t spread the load over the available hardware threads. There’s more estimates at the link I put as website.

If it’s a language war then it’s between erlang and ruby; I’m trying to find out what the VM of either language should be doing to solve this problem, so am benchmarking to find where the costs are if you write close-to-the-metal code to solve it. I wouldn’t write a log file extraction script in bit-twiddly C++; I’d use Perl or XSLT. You really shouldn’t have to, but performing experiments to help think about where the bottlenecks may be is useful.

Pete