Sendfile for Yaws

January 5th, 2009  |  Published in erlang, performance, scalability, web, yaws  |  4 Comments  |  Bookmark on

A few months back Klacke gave me committer rights for Yaws. I’ve made a few fixes here and there, including adding support for passing “*” as the request URI for OPTIONS, enabling OPTIONS requests to be dispatched to appmods and yapps, and completing a previously-submitted patch for configuring the listen backlog. Klacke has just started putting a test framework into the codebase and build system so that contributors can include tests with any patches or new code they submit, and I’ve contributed to that as well.

The biggest feature I’ve added to date, though, is a new linked-in driver that allows Yaws to use the sendfile system call on Linux, OS X, and FreeBSD. I never wrote a linked-in driver before, so I was happy and fortunate to have an Erlang expert like Klacke providing hints and reviewing my code.

I did some preliminary testing that showed that sendfile definitely improves CPU usage across the board but depending on file size, sometimes does so at the cost of increasing request times. I used my otherwise idle 2-core 2.4GHz Ubuntu 8.04.1 Dell box with 2 GB of RAM, and ran Apache Bench (ab) from another Linux host to simulate 50 concurrent clients downloading a 64k data file a total of 100000 times. I saw that user/system CPU on the web host tended to run around 33%/28% without sendfile, while with sendfile it dropped to 22%/17%. The trade-off was request time, though, where each request for the 64k file averaged 0.928ms with sendfile but 0.567ms without. With larger files, however, sendfile is slightly faster and still has better CPU usage. For example, with a 256k file, sendfile averaged 2.251ms per request with user/system CPU at 8%/16% whereas it was 2.255ms and 16%/27% CPU without sendfile, which makes me wonder if the figures for the 64k file are outliers for some reason. I performed these measurements fairly quickly, so while I believe they’re reasonably accurate, don’t take them as formal results.

On my MacBook Pro laptop running OS X 10.5.6, CPU usage didn’t seem to differ much whether I used sendfile or not, but requests across the board tended to be slightly faster with sendfile.

I ran FreeBSD 7.0.1 in a Parallels VM on my laptop, and there I saw significantly better system CPU usage with sendfile than without, sometimes as much as a 30% improvement. Requests were also noticeably faster with sendfile than without, sometimes by as much as 17%, and again depending on file size, with higher improvements for larger files. User CPU was not a factor. All in all, though, I don’t know how much the fact that I ran all this within a VM affected these numbers.

Given that Yaws is often used for delivering mainly dynamic content, sendfile won’t affect those cases. Still, I think it’s nice to have it available for the times when you do have to deliver file-based content, especially if the files are of the larger variety. Anyway, I committed this support to the Yaws svn repository back around December 21 or so. If you’d like to do your own testing, please feel free — I’d be interested in learning your results. Also, if you have ideas for further tests I might try, please leave a comment to let me know.


  1. Nicholas Dronen says:

    January 5th, 2009 at 9:47 am (#)

    My hunch is that the larger request times for smaller files is caused by sendfile blocking the calling Erlang process (and the operating system thread in which it is executing), preventing other processes from running. This is just a hunch though. A way to find out whether this is happening is to run Apache Bench serially both with and without sendfile on the server; if my hunch is correct, I think you’ll get lower average response times for the 64k files with serially-submitted requests.

  2. steve says:

    January 5th, 2009 at 12:15 pm (#)

    @Nicholas: sendfile itself is a blocking call where the caller waits for completion, so there’s not much we can do about that. But since Erlang uses async I/O everywhere, the linked-in driver is set up to be called back by the Erlang VM when its socket is ready for writing. When the socket is writeable, the driver calls sendfile and tries to send as much as it can, but it eventually gets EAGAIN or EWOULDBLOCK when writing to the socket would block. At that time it stores away how much data it successfully sent and returns control to the VM, waiting to be called back next time its socket is writeable. This ensures we don’t block the VM, which is very important.

    Performing the ab test serially, i.e. without simulated concurrent clients, for fetching the 64k file still shows sendfile to be slightly slower, averaging 1.506ms per request with sendfile vs. 1.445ms without. User/system CPU for this case is roughly the same, about 11%/11% with sendfile and 12%/12% without. For the 256k file, I see sendfile taking 3.283ms per request with CPU of 8%/12%, and non-sendfile taking 3.227ms per request with CPU at 11%/17%. For a 1MB file, sendfile takes 10.154ms at 4%/13% while non-sendfile takes 10.103ms at 10%/18%. There’s a 60 us difference in all those cases. And finally, if we go up to a 64MB file, I see sendfile at 593.2ms with CPU 1%/5% whereas non-sendfile takes 596.1ms with CPU 7%/12%; notice that sendfile’s request time is about 3ms faster for this case. So in this serialized test, overall sendfile always wins in CPU but is just slightly slower except for very large files.

  3. Andy says:

    January 6th, 2009 at 1:47 am (#)

    I’m very surprised by the increase in response time when Sendfile is enabled. I was always under the impression that Sendfile greatly improves the performance of web servers.

    What about other web servers like Apache and lighttp? Do their performances improve when Sendfile is enabled?? Maybe comparing Yaws with those servers will shed light on this.

  4. steve says:

    January 6th, 2009 at 2:04 am (#)

    @Andy: keep in mind that the response time differences are all in all pretty small, and that the win in CPU that sendfile affords is potentially very worthwhile if you’re serving a mixture of dynamic and static content.