scalability :: Steve Vinoski's Blog

scalability

QCon London: Highly Available Systems

February 14th, 2012 | Published in availability, conferences, distributed systems, scalability | Bookmark on Pinboard.in

At QCon London March 7-9 I’ll be hosting a track on Highly Available Systems, which I’ll describe in more detail below.

But first, be aware that if you register for the conference using promotion code VINO100, you’ll save yourself £100 off the registration fee plus I’ll donate £100 to the World Food Programme.

This track is compelling. As track host I focused on inviting speakers with significant experience in building and deploying real working systems that exhibit high availability, with the goal of maximizing the transfer of ideas, approaches, tools, and techniques from the speakers to the attendees.

The track kicks off with Joe Armstrong, the father of Erlang, talking about building highly-available systems with Erlang.
Next up, Mark McGranaghan of Heroku will present approaches they use at Heroku to ensure high availability.
After lunch, John Allspaw of Etsy will talk about fault tolerance, anomaly detection, and anticipation patterns. John is well known for his excellent books on capacity planning and web operations. John is also giving the Friday morning keynote, “Resilient Response In Complex Systems”.
Following John will be Jodi Moran, CTO at Plumbee, who will tell us about what it takes to build systems capable of going from zero to ten million users in 4 weeks.
We’ll wrap up the track with Martin Thompson, a specialist in high-speed, highly-available and low-latency systems who helped build the LMAX Disruptor, who will talk about event-sourced architectures and what we’ve forgotten about high availability.

It’s gonna be great, no doubt, and I’m really looking forward to it. Hope to see you there!

I’ll also be giving a talk at the conference about distributed systems and Riak Core.

Sendfile for Yaws

January 5th, 2009 | Published in erlang, performance, scalability, web, yaws | Bookmark on Pinboard.in

A few months back Klacke gave me committer rights for Yaws. I’ve made a few fixes here and there, including adding support for passing “*” as the request URI for OPTIONS, enabling OPTIONS requests to be dispatched to appmods and yapps, and completing a previously-submitted patch for configuring the listen backlog. Klacke has just started putting a test framework into the codebase and build system so that contributors can include tests with any patches or new code they submit, and I’ve contributed to that as well.

The biggest feature I’ve added to date, though, is a new linked-in driver that allows Yaws to use the sendfile system call on Linux, OS X, and FreeBSD. I never wrote a linked-in driver before, so I was happy and fortunate to have an Erlang expert like Klacke providing hints and reviewing my code.

I did some preliminary testing that showed that sendfile definitely improves CPU usage across the board but depending on file size, sometimes does so at the cost of increasing request times. I used my otherwise idle 2-core 2.4GHz Ubuntu 8.04.1 Dell box with 2 GB of RAM, and ran Apache Bench (ab) from another Linux host to simulate 50 concurrent clients downloading a 64k data file a total of 100000 times. I saw that user/system CPU on the web host tended to run around 33%/28% without sendfile, while with sendfile it dropped to 22%/17%. The trade-off was request time, though, where each request for the 64k file averaged 0.928ms with sendfile but 0.567ms without. With larger files, however, sendfile is slightly faster and still has better CPU usage. For example, with a 256k file, sendfile averaged 2.251ms per request with user/system CPU at 8%/16% whereas it was 2.255ms and 16%/27% CPU without sendfile, which makes me wonder if the figures for the 64k file are outliers for some reason. I performed these measurements fairly quickly, so while I believe they’re reasonably accurate, don’t take them as formal results.

On my MacBook Pro laptop running OS X 10.5.6, CPU usage didn’t seem to differ much whether I used sendfile or not, but requests across the board tended to be slightly faster with sendfile.

I ran FreeBSD 7.0.1 in a Parallels VM on my laptop, and there I saw significantly better system CPU usage with sendfile than without, sometimes as much as a 30% improvement. Requests were also noticeably faster with sendfile than without, sometimes by as much as 17%, and again depending on file size, with higher improvements for larger files. User CPU was not a factor. All in all, though, I don’t know how much the fact that I ran all this within a VM affected these numbers.

Given that Yaws is often used for delivering mainly dynamic content, sendfile won’t affect those cases. Still, I think it’s nice to have it available for the times when you do have to deliver file-based content, especially if the files are of the larger variety. Anyway, I committed this support to the Yaws svn repository back around December 21 or so. If you’d like to do your own testing, please feel free — I’d be interested in learning your results. Also, if you have ideas for further tests I might try, please leave a comment to let me know.

RESTful Data

February 28th, 2008 | Published in column, coupling, integration, REST, scalability | Bookmark on Pinboard.in

In my Jan/Feb Internet Computing column, Serendipitous Reuse (PDF), I talked about interface coupling and the benefits of REST’s uniform interface constraint. I find that whenever you discuss that topic, though, REST detractors tend to say, “Well, you’re just pushing the coupling problems to the data.”

The problem with that assertion is that it assumes coupling is a fixed constant — if you eliminate it from one point, whatever you’ve gotten rid of just has to pop up somewhere elsewhere, like some sort of strange “Conservation of Coupling” law. Of course, that’s not true. In my latest column, Demystifying RESTful Data Coupling (PDF), I turn my attention to this claim and explain how RESTful data works, and why it too, like RESTful interfaces, reduces coupling when compared to WS-* and other similar approaches.

Constructive feedback welcomed, as always.

Internet Computing Call for Special Issue Proposals

January 22nd, 2008 | Published in distributed systems, integration, performance, publishing, REST, reuse, scalability, services | Bookmark on Pinboard.in

As you may know, I’m a columnist for IEEE Internet Computing (IC), and I’m also on their editorial board. Our annual board meeting is coming up, so to help with planning, we’ve issued a call for special issue proposals.

The topics that typically come up in this blog and others it connects to are pretty much all fair game as special issue topics: REST and the programmatic web, service definition languages, scalability issues, intermediation, tools, reuse, development languages, back-end integration, etc. Putting together a special issue doesn’t take a lot of work, either. It requires you to find 3-4 authors each willing to contribute an article, reviewers to review those articles (and IC can help with that), and a couple others to work with you as editors. As editors you also have to write a brief introduction for the special issue. I’ve done a few special issues over the years and if you enlist the right authors, it’s a lot less work than you might think.

As far as technical magazines go, IC is typically one of the most cited, usually second only to IEEE Software, as measured by independent firms. I think one reason for this is that it has a nice balance of industry and academic articles, so its pages provide information relevant to both the practitioner and the researcher.

“Internet SOAP” vs. REST: Huh?

December 28th, 2007 | Published in distributed systems, REST, scalability, SOA | Bookmark on Pinboard.in

Dilip Ranganathan pointed me to a long rant from Ganesh Prasad about using SOAP at Internet scale. I see that Stu Charlton already chimed in there with some good comments and analysis, but I think there’s still more to say.

Unless I’m missing something, Ganesh seems to be saying, “Hey, if we just stick SOAP directly onto TCP, we can scale beyond Web scale to Internet scale!” Oh, if it were only so easy. I would think that it’s fairly obvious that just because TCP scales well doesn’t mean that higher-level protocols sitting on top of it automatically scale to the same degree.

Why does the Web scale so well? Because of particular constraints deliberately imposed to induce specific architectural properties. The caching constraint contributes heavily to Web scalability, for example. Statelessness and the uniform interface also play a big role there. These constraints along with conditional GET allow messages to be significantly reduced in size or better yet, eliminated altogether. The resulting scalability impact is huge.

Ganesh talks about a lot of the things you’d have to add to the mix get a useful SOA ecosystem on top of SOAP/TCP, but nowhere does he talk about the specific architectural properties and constraints required to make it all scale. Without that, it just ain’t gonna happen. Furthermore, I don’t believe any system based either on interface specialization (i.e., the opposite of the uniform interface constraint) or on “processThis” can scale to Web scale. Interface specialization significantly increases coupling while reducing visibility and applicability, while “processThis” is so devoid of semantics that it offers nowhere to practically apply constraints like caching and statelessness that are so critical to scalability.

Steve Vinoski's Blog