Archive for October, 2007

Reliability with Erlang

October 31st, 2007  |  Published in column, erlang  |  Bookmark on

You’ve probably heard a lot by now about Erlang’s concurrency capabilities, and in fact that’s what I covered in my Sept./Oct. Internet Computing “Toward Integration” column, Concurrency with Erlang. But concurrency is only part of the story — Erlang also provides outstanding support for building highly-reliable software. My latest column, Reliability with Erlang, first describes some of the problems that highly-reliable systems face, and then explains some of Erlang’s core primitives that provide a solid foundation for reliable systems. It’s available in HTML and PDF formats.

BTW, I can’t recommend enough that you pick up a copy of Joe Armstrong’s Programming Erlang. It’s a truly excellent book that has deeply and positively affected the way I think about designing, building, and implementing software systems. As I mentioned in my columns, my only disappointment with Erlang is that I didn’t discover it 10 years ago when it was first open-sourced, as it could have saved me a ton of time and trouble in my middleware development efforts over the years.

There’s No Hope For IT

October 29th, 2007  |  Published in dynamic languages, enterprise, REST, services  |  Bookmark on

When you read stuff like this, you can’t help but feel that IT is, without a doubt, doomed.

The gist of the posting is

  • REST is too hard for the average developer
  • Dynamic languages are too hard for the average developer

Along these same lines, a couple of folks told me in person that my recent blog entries about REST, dynamic languages, and ESBs were misguided because today’s enterprises are interested only in approaches, frameworks and platforms that allow average developers to produce quality systems.

This all strikes me as nothing but wrong-headed thinking.

On the dynamic language front, the worst code I have seen in my career has always, always, always been in compiled imperative languages, most often Java and C++. I would much rather let an average developer loose with a dynamic language, because the surface area is smaller, there’s a lot less rope available for self-hanging, and if they’re going to fail, they’ll fail way faster and thus allow much more time for recovery. The fact that dynamic language programs are usually smaller than their compiled counterparts means that they’re easier to read and review, and statistically, they’re likely to have fewer bugs. Furthermore, counting on the static language compiler to save you is simply wishful thinking. To paraphrase Tim Ewald from a conversation he and I had during lunch a week or so ago, compilation really amounts to just another unit test.

On the REST front, if you’re claiming that it’s harder than the alternatives, to me that’s just a sign that you don’t understand it. Is REST simple? No, but neither is SOA. However, unlike SOA, which is fairly wishy-washy, noncommittal, and loose, REST’s constraints provide real, actual guidance for developers, and those same constraints also provide opportunities for significant flexibility, extensibility, performance, scalability, and serendipity. SOA’s contracts come with no rules or constraints, and thus can easily result in a system that’s extremely brittle, tightly-coupled, and virtually impossible to upgrade. SOA itself isn’t inherently bad, as it’s certainly a step above the “every application for itself” mode of development that’s so widely practiced. Unlike REST, though, SOA doesn’t go nearly far enough to provide real, useful guidance to the poor developer who has to actually write the stuff, make it work, and keep it running.

And finally, regarding the overall notion that enterprises cater only to average developers, I’m not sure I agree. In my former life I met countless enterprise developers who were extremely sharp. While I have no doubt that there are numerous bean-counting CIOs and middle IT managers out there who think they can build high-quality IT systems with low-quality developers, at the end of the day, businesses generally know better than to think they can get something for nothing. Or to put it another way, they know they get what they pay for, and if they pay only for average developers or worse, they’ll get only average software and average systems, or worse. That’s a no-brainer.

If you’re in a position of technical leadership or project management and you’re asked to come in ahead of schedule and under budget, my advice is that you’re generally more likely to succeed with REST and dynamic languages than with the alternatives because their inherent constraints allow for better focus. Also, if you find yourself in such a position, you owe it to yourself and your team to continually lobby your superiors to help them understand the very real costs of their budgetary stinginess.

Ron Schmelzer on ESBs

October 24th, 2007  |  Published in integration, services  |  Bookmark on

A little over a week ago, Ron Schmelzer of ZapThink, who’s pretty well known as an expert SOA analyst, quietly snuck an interesting comment into the ESB brouhaha that developed here recently. The ESB proponents who expressed displeasure at my view of ESBs, especially those who quoted ZapThink in their defense, will want to read what Ron had to say. If you don’t feel like chasing that link, here’s what he said:

The poster (Curt) who says that ZapThink says that ESBs are an enabling technology on the road to SOA has mischaracterized our position. Speaking from ZapThink’s perspective, we don’t believe that ESBs are neither necessary nor sufficient to enable SOA. In fact, we’ve seen plenty of SOA solutions that leverage a wide variety of non-ESB infrastructure. To be as unambiguous as possible: ESB is vendor marketing spin. True, there is certainly capabilities within an ESB that *might* enable companies to produce truly loosely coupled, composite, and heterogeneous Services in an environment of continuous change, but you can just as easily build tightly-coupled, proprietary, point-to-point Service integration with ESBs. There’s nothing about an ESB that substitutes for the need to do architecture. And there’s nothing about architecture that requires the adherence to a particular technological infrastructure.

If you want to make SOA work in a heterogeneous environment, why would you want to limit yourself to one technology, one approach? You’re buying right into their strategy of locking you into a platform. That’s only good if you sell platforms. Wake up folks – architecture is YOUR responsibility, not that of some vendors hawking middleware!

So, don’t put ZapThink in the camp of the ESB bigots. We certainly are not. Implement SOA with intermediaries and REST. Why not?

Faster WF Still

October 21st, 2007  |  Published in erlang, performance  |  Bookmark on

OK, so you could say that I’m a bit obsessed with Tim Bray’s Wide Finder project. Just a little. I mean, I started this blog just a few weeks ago and so far almost every posting has been about it:

There’s also my Sept./Oct. Internet Computing column, Concurrency with Erlang, and sometime in early November, my next column, entitled Reliability with Erlang, will be published. Neither column is connected at all to the Wide Finder, but they just further reveal my current obsession with Erlang.

In my previous post I described my fastest solution up to that point, but here’s an even faster one: tbray16.erl, which is identical in every way to tbray15.erl from my previous post except that it uses wfbm4.erl, which provides all the performance gains. This version of Boyer-Moore searching includes two simple tweaks:

  • Uses hard-coded constants for string lengths, since the strings are fixed, rather than constantly recalculating them with length/1.
  • Fixes a nagging problem with my Boyer-Moore implementation where it wasn’t handling repeated characters in the fixed pattern very well. What I did in the previous version was choose the lesser of the two shifts if both characters appeared in the pattern, which worked but isn’t technically correct, and it also meant two dict lookups to get the shift values rather than just one. Now, I do the right thing: just keep track of the number of comparisons and subtract that from the shift value, use that if the result is positive and non-zero, otherwise just shift by 1.

This version shaves another whole second off the previous version for about a 25% speedup. The fastest I’ve seen it run on my 8-core Linux box is:

real    0m3.107s
user    0m16.243s
sys     0m2.134s

Meanwhile, Anders Nygren has been exploring eliminating using the dict altogether for the shift value lookup, but nothing I’ve tried there has been an improvement. But thanks to Anders for prompting me to properly fix that Boyer-Moore code and at least eliminate one of the dict lookups.

OK, Just One More WF

October 18th, 2007  |  Published in erlang, performance  |  Bookmark on

When writing my previous post I silently hoped I was finished contributing more Erlang solutions to Tim Bray’s Wide Finder project. Tim already told me my code was running really well on his T5120, and yet in my last post, I nearly doubled the speed of that code, so I figured I was in good shape. But then Caoyuan Deng came up with something faster. He asked me to run it on my 8-core Linux box, and sure enough, it was fast, but didn’t seem to be using the CPU that well.

So, I thought some more about the problem. Last time I said I was using Boyer-Moore searching, but only sort of. This is because I was using Erlang function argument pattern matching, which proceeds forward, not backward as Boyer-Moore does. I couldn’t help but think that I could get more speed by doing that right.

I was also concerned about the speed of reading the input file. Reading it in chunks seems like the obvious thing to do for such a large file (Tim’s sample dataset is 236140827 bytes). It turns out that reading in chunks can cumulatively take over a second using klacke’s bfile module, but it takes only about a third of a second to read the whole thing into memory in one shot. By my measurements the bfile module is noticeably faster at doing this than the Erlang file:read_file/1. Even my Macbook Pro can read the whole dataset without significant trouble, so I imagine the T5120 can do it with ease.

So, I changed tactics:

  • Read the whole dataset into an Erlang binary in one shot, then break it into chunks based on the number of schedulers that the Erlang system is using.
  • Stop breaking chunks into smaller blocks at newline boundaries. This took too much time. Instead, just grab a block, search from the end to find the final newline, and then process it for pattern matches.
  • Change the search module to do something much closer to Boyer-Moore regarding backwards searching, streamline the code that matches the variable portion of the search pattern, and be smarter about skipping ahead on failed matches.
  • Balance the parallelized collection of match data by multiple independent processes against the creation of many small dictionaries that later require merging.

This new version reads the whole dataset, takes the first chunk, finds the final newline, then kicks off one process to collect matches and a separate process to find the matches. It then moves immediately onto the next block, doing the same thing again. What that means is the main process spends its time finding newlines and launching processes while other processes look for matches and collect them. At the end, the main process collects the collections, merges them, and prints out the top ten.

On my 8-core 2.33GHz Linux box with 8 GB of RAM:

$ time erl -smp -noshell -run tbray15 main o1000k.ap
2959: 2006/09/29/Dynamic-IDE
2059: 2006/07/28/Open-Data
1636: 2006/10/02/Cedric-on-Refactoring
1060: 2006/03/30/Teacup
942: 2006/01/31/Data-Protection
842: 2006/10/04/JIS-Reg-FD
838: 2006/10/06/On-Comments
817: 2006/10/02/Size-Matters
682: 2003/09/18/NXML
630: 2003/06/24/IntelligentSearch

real    0m4.124s
user    0m25.124s
sys     0m1.916s

At 4.124s, this is significantly faster than the 6.663s I saw with my previous version. The user time is 6x the elapsed time, so we’re using the cores well. What’s more, if you change the block size, which indirectly controls the number of Erlang processes that run, you can clearly see a pretty much linear speedup as more cores get used. Below is the output from a loop where the block size starts at the file size and then divides by two on each iteration (I’ve edited the output to make it more compact by flattening the time output into three columns):

$ ((x=236140827)) ; while ((x>32768))
do echo $x
    time erl -smp -noshell -run tbray15 main o1000k.ap $x >/dev/null

236140827: 0m38.072s, 0m52.159s, 0m3.984s
118070413: 0m18.294s, 0m37.922s, 0m4.571s
59035206:  0m11.374s, 0m36.694s, 0m9.098s
29517603:  0m4.598s,  0m27.825s, 0m2.180s
14758801:  0m4.225s,  0m26.237s, 0m2.134s
7379400:   0m4.181s,  0m25.779s, 0m1.873s
3689700:   0m4.124s,  0m25.124s, 0m1.916s
1844850:   0m4.149s,  0m24.931s, 0m1.969s
922425:    0m4.132s,  0m24.894s, 0m1.822s
461212:    0m4.170s,  0m24.588s, 0m2.026s
230606:    0m4.185s,  0m24.548s, 0m2.035s
115303:    0m4.215s,  0m24.755s, 0m2.025s
57651:     0m4.317s,  0m25.199s, 0m1.985s

The elapsed time between the top four entries show the multiple cores kicking in, essentially doubling the performance each time. Once we hit the 4 second range, performance gains are small but steady roughly down to a block size of 922425, but then they start to creep up again. My guess is that this is because smaller blocks mean more Erlang dict instances being created to capture matches, and all those dictionaries then have to be merged to collect the final results. In the middle, where performance is best, the user time is roughly 6x the elapsed time as I already mentioned, which means that if Tim runs this on his T5120, he should see excellent performance there as well.

Feel free to grab the files tbray15.erl and wfbm3.erl if you want to try it out for yourself.