The Technology Adoption Side of RPC and REST

September 3rd, 2008  |  Published in column, enterprise, innovation, REST, RPC, technology adoption  |  1 Comment  |  Bookmark on Pinboard.in

My latest Internet Computing column has been available since last Friday. It’s entitled “RPC and REST: Dilemma, Disruption, and Displacement” (PDF, HTML) and like my previous 2008 columns, it explores another angle of the “RPC vs. REST” debate.

Since previous columns have covered many of the technical angles, this time I present the debate from the technology adoption angle. As the abstract for the column says, many technologists tend to treat such debates as if they’re purely technical, but of course they’re never that black-and-white. What’s often behind some of the raging “technical” debates we’ve all seen or experienced is simply the difference between the arguing parties in their relative positions along the Technology Adoption Lifecycle curve. Nobody would be surprised at a disagreement over technology between someone classified as an early adopter or visionary (from the far left of the curve) and someone classified as a technology skeptic (from the far right), yet we always seem surprised when two people whose preferences aren’t too far apart on the curve — from the opposite edges of the mainstream band in the middle of the bell curve, for example — don’t see eye to eye, despite the fact that this sort of scenario is quite common. Even small differences in goals for adopted technologies and desired risk/reward trade-offs, along with the inevitable hidden and unstated assumptions resulting from such factors, can cause vigorous debate about what technology or approach is best for a given situation.

When it comes to published explanations of how innovation works and how technologies move along the adoption curve, my favorite author by far is Clayton Christensen. IMO all developers should study and learn from his books, specifically The Innovator’s Dilemma, The Innovator’s Solution, and Seeing What’s Next. All are amazingly insightful works that will open your eyes to how real-life markets react to technological change and advancement.

In this column I try to view and classify the “RPC vs. REST” debate based on Christensen’s theories about innovation and technology adoption. I hope you find it interesting, and as always, I welcome all constructive comments.

You Have to Experience It

August 16th, 2008  |  Published in commentary, erlang, productivity, REST, WS-*  |  14 Comments  |  Bookmark on Pinboard.in

I’ve noticed that frequently in technical discussions, the strongest disagreements seem to come from people with little to no actual experience with the technology they’re arguing against. How can that be? For example:

  • Test-First Development. I wish I had a dollar for every time I’ve suggested to a developer that writing their tests before or along with writing their code will make the code not only easier to write but also more robust coming out of the gate, and I get back responses like, “What? That’s crazy! How can you write tests before you have any code? That doesn’t make any sense!” Having an initial reaction like that isn’t such a big deal, as I’ve seen numerous developers who have such reactions actually try the “test first” approach and quickly become strong advocates who wonder how they ever did without it. The point is, though, that they actually tried it. Arguing with them before they tried it always turned out to be a total waste of time. No amount of words seemed to convince them. They had to experience it before they understood it.

  • Erlang syntax. Erlang is getting more and more attention these days, and rightfully so, but a typical reaction from those who have written little to no Erlang code is that the language’s syntax is too weird, too hard to read and write, etc. Is the syntax different? Yes. Is it weird or difficult? No, not at all — in fact, it’s actually very simple and regular when compared to popular general-purpose imperative languages. Spend a day or two writing some real Erlang code, and I guarantee you that any initial dislike you might have for its syntax will disappear.

  • REST. If you search the blog of any REST proponent, including this one, you’re sure to find all kinds of comments from detractors who argue against REST despite never having used it to develop any real systems. Similarly, the blogs of many WS-* advocates who have never tried using REST contain all kinds of reasons why REST can’t possibly work. Check out the comments in Damien Katz’s recent “REST, I just don’t get it” posting, for example; you have useful ones from those who have obviously used REST and understand its benefits, and then you have other comments that argue against REST while simultaneously showing a great misunderstanding of it. Those detractors would do well to read Bill de hÓra’s excellent response.

Also interesting about these three particular cases is that I don’t personally know of anyone who’s actually tried the approaches and decided against them. In a posting last November, for example, I asked for comments from anyone who had actually tried REST for real and with an open mind, but decided that it was inferior to WS-* and so abandoned it. Either nobody read that posting or no such people exist. I’m fairly confident it’s the latter.

There will always be arguments made by people whose livelihood is somehow threatened by the approach they’re opposing, but I don’t think that’s the source of all the opposing arguments. As developers we can’t possibly try everything, of course, because there just isn’t enough time. It’s inevitable that we’ll sometimes have to resort to researching an approach via only reading, questions and discussion and decide against it without prototyping. But ultimately we developers owe it to ourselves and our employers to keep ourselves objectively informed so that we can take advantage of new approaches whenever appropriate. When a whole bunch of smart developers have success with a particular approach, I don’t see how any responsible developer can actively and vocally oppose it without first objectively trying the approach and experiencing it firsthand.

Spot On

July 28th, 2008  |  Published in design, distributed systems, REST, RPC, SOA, WS-*  |  4 Comments  |  Bookmark on Pinboard.in

Eric Newcomer gives us his take on my recent articles (1, 2, 3, 4) and blog postings (1, 2, 3, 4) about RPC, REST, and programming languages:

Anyway, after carefully reading the article and blog entries, I believe Steve is not against RPC per se. He wants people to think before just automatically using it because it’s convenient.

Exactly!

Also spot on are the following postings:

The beauty common to all these postings is the breadth, depth, and variety of thinking and reasoning they present. There’s a lot to read, but if you’re interested in critical thinking about the design and construction of distributed systems I encourage you to read them all the way through, including the comments, and to follow the links they offer as well.

Protocol Buffers: Leaky RPC

July 13th, 2008  |  Published in distributed systems, RPC, services  |  72 Comments  |  Bookmark on Pinboard.in

Mark Pilgrim tells us why Protocol Buffers are so nice. Notice, though, that everything he writes focuses entirely on their form and structure as messages. If you focus only on that perspective, then sure, they’re better than what many could come up with if they were rolling their own. In fact, if Google had stopped there, I think Protocol Buffers could be a superb little package.

But they didn’t stop there. No, they had to include the fundamentally flawed remote procedure call. Well, sort of, anyway:

The Protocol Buffer library does not include an RPC implementation. However, it includes all of the tools you need to hook up a generated service class to any arbitrary RPC implementation of your choice. You need only provide implementations of RpcChannel and RpcController.

Why ruin a perfectly good messaging format by throwing this RPC junk into the package? What if I want to send these messages via some other means, such as message queuing, for example? Do I have to pay for this RPC code if I don’t need it? If my messages don’t include service definitions, do I avoid all that RPC machinery?

In my previous post I talked about the message tunneling problem, where data that doesn’t fit the distributed type system are forced through the system by packing them into a type such as string or sequence of octets. Since Protocol Buffers require you to “hook up a generated service class to any arbitrary RPC implementation of your choice,” it’s likely that you’re going to run into this tunneling problem. For example, if you want to send this stuff over IIOP, you’re likely going to send the marshaled protobufs as Common Data Representation (CDR) sequences of octet. You’re thus unavoidably paying for marshaling twice: once at the protobuf level for the protobuf itself, and then again at the CDR level to marshal the sequence of octet containing the protobuf. Any worthwhile IIOP/CDR implementation will be very fast at marshaling sequences of octet, but still, overhead is overhead.

But there are other problems too. What about errors? If something goes wrong with the RPC call, how do I figure that out? The answer appears to be that you call the RpcController to see if there was a failure, and if so, call it again to get a string indicating what the failure was. A string? This implies that I not only have to write code to convert exceptions or status codes from the underlying RPC implementation into strings, but also write code to convert them back again into some form of exception, assuming my RPC-calling code wants to throw exceptions to indicate problems to the code that calls it.

What about idempotency? If something goes wrong, how do I know how far the call got? Did it fail before it ever got out of my process, or off my host? Did it make it to the remote host? Did it make it into the remote process, but failed before it reached the service implementation? Or did it fail sometime after the service processed it, as the response made its way back to me? If the call I’m making is not idempotent, and I want to try it again if I hit a failure, then I absolutely need to know this sort of information. Unfortunately, Protocol Buffers supplies nothing whatsoever to help with this problem, instead apparently punting to the underlying RPC implementation.

Still more problems: the RpcController offers methods for canceling remote calls. What if the underlying RPC package doesn’t support this? Over the years I’ve seen many that don’t. Note that this capability impacts the idempotency problem as well.

Another question: what about service references? As far as I can see, the protobuf language doesn’t support such things. How can one service return a message that contains a reference to another service? I suspect the answer is, once again, data tunneling — you would encode your service reference using a form supported by the underlying RPC implementation, and then pass that back as a string or sequence of bytes. For example, if you were using CORBA underneath, you might represent the other service using a stringified object reference and return that as a string. Weak.

All in all, the Protocol Buffers service abstraction is very leaky. It doesn’t give us exceptions or any ways of dealing with failure except a human-readable string. It doesn’t give us service references, so we have no way to let one service refer to another within a protobuf message. We are thus forced to work in our code simultaneously at both the Protocol Buffers level and also at the underlying RPC implementation level if we have any hope of dealing with these very-real-world issues.

My advice to Google, then, is to just drop all the service and RPC stuff. Seriously. It causes way more problems than it’s worth, it sends people down a fundamentally flawed distributed computing path, and it takes away from what is otherwise a nice message format and structure. If Google can’t or won’t drop it, then they should either remove focus from this aspect by relegating this stuff to an appendix in the documentation, or if they choose to keep it all at the current level of focus, then they should clarify all the questions of the sort I’ve raised here, potentially modifying their APIs to address the issues.

Protocol Buffers: No Big Deal

July 11th, 2008  |  Published in distributed systems, HTTP, RPC, services  |  7 Comments  |  Bookmark on Pinboard.in

I’ve gotten a few emails asking me to blog my opinion of Google’s Protocol Buffers. Well, I guess I pretty much share Stefan’s opinion. I certainly don’t see this stuff providing anything tremendously innovative, so as with Cisco Etch, it seems to me to be just another case of NIH.

Ted Neward already wrote a pretty thorough analysis — it’s almost 0.85 Yegges in length! — so I’ll just refer you to him. There are at least two important points he made that bear repeating, though:

Which, by the way, brings up another problem, the same one that plagues CORBA, COM/DCOM, WSDL-based services, and anything that relies on a shared definition file that is used for code-generation purposes, what I often call The Myth of the One True Schema.

Indeed. Usually when one points this out, those who disagree with you come back with, “Oh, that doesn’t matter — you can just send whatever data you want as a string or as an array of bytes!” Having been forced to do just that numerous times back in my CORBA days, I know that’s not a good answer. You have a bunch of distributed infrastructure trying to enforce your One True Schema Type System, yet you go the extra mile to tunnel other types that don’t fit that schema through it all, and the extra layers can end up being especially complicated and slow.

The second point I’ll quote from Ted says it all:

Don’t lose sight of the technical advantages or disadvantages of each of those solutions just because something has the Google name on it.

Most excellent advice.

There’s one last thing I’ll quote. This one’s not from Ted, but directly from the Protocol Buffers documentation:

For example, you might implement an RpcChannel which serializes the message and sends it to a server via HTTP.

Sigh.

[Update: more opinions — and some questions — in my next post.]