<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: More File Processing with Erlang</title>
	<atom:link href="http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/feed/" rel="self" type="application/rss+xml" />
	<link>http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/</link>
	<description>Ask forgiveness, not permission.</description>
	<pubDate>Sun, 07 Sep 2008 17:31:24 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.1</generator>
		<item>
		<title>By: Hypothetical Labs &#187; Worse Is Better Scaling</title>
		<link>http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-116</link>
		<dc:creator>Hypothetical Labs &#187; Worse Is Better Scaling</dc:creator>
		<pubDate>Thu, 11 Oct 2007 14:54:19 +0000</pubDate>
		<guid isPermaLink="false">http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-116</guid>
		<description>[...] It&#8217;s spawned a couple of interesting threads on erlang-questions and several insightful blog [...]</description>
		<content:encoded><![CDATA[<p>[...] It&#8217;s spawned a couple of interesting threads on erlang-questions and several insightful blog [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hynek (Pichi) Vychodil</title>
		<link>http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-85</link>
		<dc:creator>Hynek (Pichi) Vychodil</dc:creator>
		<pubDate>Mon, 08 Oct 2007 06:38:02 +0000</pubDate>
		<guid isPermaLink="false">http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-85</guid>
		<description>Steve: why bfile should be faster than file?
http://pichis-blog.blogspot.com/2007/10/is-bfile-faster-than-old-erlang-file.html</description>
		<content:encoded><![CDATA[<p>Steve: why bfile should be faster than file?<br />
<a href="http://pichis-blog.blogspot.com/2007/10/is-bfile-faster-than-old-erlang-file.html" rel="nofollow">http://pichis-blog.blogspot.com/2007/10/is-bfile-faster-than-old-erlang-file.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dilip</title>
		<link>http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-47</link>
		<dc:creator>Dilip</dc:creator>
		<pubDate>Thu, 04 Oct 2007 21:03:44 +0000</pubDate>
		<guid isPermaLink="false">http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-47</guid>
		<description>FWIW Joe Cheng has a C# 3.0 version[1] of this problem that seems to perform better than Ruby (even in terms of code brevity!).  Of course it still needs PLINQ to make use of multi core/CPU hardware.

[1] http://jcheng.wordpress.com/2007/10/02/wide-finder-with-linq/#more-240</description>
		<content:encoded><![CDATA[<p>FWIW Joe Cheng has a C# 3.0 version[1] of this problem that seems to perform better than Ruby (even in terms of code brevity!).  Of course it still needs PLINQ to make use of multi core/CPU hardware.</p>
<p>[1] <a href="http://jcheng.wordpress.com/2007/10/02/wide-finder-with-linq/#more-240" rel="nofollow">http://jcheng.wordpress.com/2007/10/02/wide-finder-with-linq/#more-240</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: steve</title>
		<link>http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-36</link>
		<dc:creator>steve</dc:creator>
		<pubDate>Wed, 03 Oct 2007 21:09:22 +0000</pubDate>
		<guid isPermaLink="false">http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-36</guid>
		<description>Hi Pete, yes, MacBook Pro disk I/O throughput is best case about 45 MB/sec from some figures I've seen, which I think would put us in the 4-5 sec range for reading this data, best case. If I run the Ruby solution on the full dataset, the first time it takes 7.5-8 secs, but the second time it takes 2.2-2.5 secs, which I believe shows the caching effects. But if we're getting cached data, then I think that reinforces my point about the Erlang code not being I/O bound.

Your second paragraph above has some very good insights, and your final paragraph is right on the money, IMO. I'd be really interested in seeing your benchmark results (I assume they'll be on your website?), and needless to say Tim's T2 results should be quite interesting as well.</description>
		<content:encoded><![CDATA[<p>Hi Pete, yes, MacBook Pro disk I/O throughput is best case about 45 MB/sec from some figures I&#8217;ve seen, which I think would put us in the 4-5 sec range for reading this data, best case. If I run the Ruby solution on the full dataset, the first time it takes 7.5-8 secs, but the second time it takes 2.2-2.5 secs, which I believe shows the caching effects. But if we&#8217;re getting cached data, then I think that reinforces my point about the Erlang code not being I/O bound.</p>
<p>Your second paragraph above has some very good insights, and your final paragraph is right on the money, IMO. I&#8217;d be really interested in seeing your benchmark results (I assume they&#8217;ll be on your website?), and needless to say Tim&#8217;s T2 results should be quite interesting as well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pete Kirkham</title>
		<link>http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-35</link>
		<dc:creator>Pete Kirkham</dc:creator>
		<pubDate>Wed, 03 Oct 2007 19:59:48 +0000</pubDate>
		<guid isPermaLink="false">http://steve.vinoski.net/blog/2007/09/29/more-file-processing-with-erlang/#comment-35</guid>
		<description>I do think on that particular measure you're reading from cache rather than physical disk; on my laptop it takes 8.5s if you clear the cache, 0.4s if you don't. Most laptop drives are around 30MBps; a bit more if 10,000 rpm. Flash drives are twice that, but excel at seek time.

However, that sort of number may actually be representative of the target system - if you assume a transfer rate in the 2GBps ballpark (Sun's Thumper gives 2GBps to memory), and the T2's 1.4GHz core, that gives only 0.7 CPU cycles (per hardware thread) to process each byte of data, so even the C++ code (which takes around 3 cycles per byte on my laptop) would be CPU limited rather than IO limited if you don't spread the load over the available hardware threads.  There's more estimates at the link I put as website.

If it's a language war then it's between erlang and ruby; I'm trying to find out what the VM of either language should be doing to solve this problem, so am benchmarking to find where the costs are if you write close-to-the-metal code to solve it. I wouldn't write a log file extraction script in bit-twiddly C++; I'd use Perl or XSLT. You really shouldn't have to, but performing experiments to help think about where the bottlenecks may be is useful. 


Pete</description>
		<content:encoded><![CDATA[<p>I do think on that particular measure you&#8217;re reading from cache rather than physical disk; on my laptop it takes 8.5s if you clear the cache, 0.4s if you don&#8217;t. Most laptop drives are around 30MBps; a bit more if 10,000 rpm. Flash drives are twice that, but excel at seek time.</p>
<p>However, that sort of number may actually be representative of the target system - if you assume a transfer rate in the 2GBps ballpark (Sun&#8217;s Thumper gives 2GBps to memory), and the T2&#8217;s 1.4GHz core, that gives only 0.7 CPU cycles (per hardware thread) to process each byte of data, so even the C++ code (which takes around 3 cycles per byte on my laptop) would be CPU limited rather than IO limited if you don&#8217;t spread the load over the available hardware threads.  There&#8217;s more estimates at the link I put as website.</p>
<p>If it&#8217;s a language war then it&#8217;s between erlang and ruby; I&#8217;m trying to find out what the VM of either language should be doing to solve this problem, so am benchmarking to find where the costs are if you write close-to-the-metal code to solve it. I wouldn&#8217;t write a log file extraction script in bit-twiddly C++; I&#8217;d use Perl or XSLT. You really shouldn&#8217;t have to, but performing experiments to help think about where the bottlenecks may be is useful. </p>
<p>Pete</p>
]]></content:encoded>
	</item>
</channel>
</rss>
