RPC

The Technology Adoption Side of RPC and REST

September 3rd, 2008  |  Published in REST, RPC, column, enterprise, innovation, technology adoption  |  Add to del.icio.us

My latest Internet Computing column has been available since last Friday. It’s entitled “RPC and REST: Dilemma, Disruption, and Displacement” (PDF, HTML) and like my previous 2008 columns, it explores another angle of the “RPC vs. REST” debate.

Since previous columns have covered many of the technical angles, this time I present the debate from the technology adoption angle. As the abstract for the column says, many technologists tend to treat such debates as if they’re purely technical, but of course they’re never that black-and-white. What’s often behind some of the raging “technical” debates we’ve all seen or experienced is simply the difference between the arguing parties in their relative positions along the Technology Adoption Lifecycle curve. Nobody would be surprised at a disagreement over technology between someone classified as an early adopter or visionary (from the far left of the curve) and someone classified as a technology skeptic (from the far right), yet we always seem surprised when two people whose preferences aren’t too far apart on the curve — from the opposite edges of the mainstream band in the middle of the bell curve, for example — don’t see eye to eye, despite the fact that this sort of scenario is quite common. Even small differences in goals for adopted technologies and desired risk/reward trade-offs, along with the inevitable hidden and unstated assumptions resulting from such factors, can cause vigorous debate about what technology or approach is best for a given situation.

When it comes to published explanations of how innovation works and how technologies move along the adoption curve, my favorite author by far is Clayton Christensen. IMO all developers should study and learn from his books, specifically The Innovator’s Dilemma, The Innovator’s Solution, and Seeing What’s Next. All are amazingly insightful works that will open your eyes to how real-life markets react to technological change and advancement.

In this column I try to view and classify the “RPC vs. REST” debate based on Christensen’s theories about innovation and technology adoption. I hope you find it interesting, and as always, I welcome all constructive comments.

Spot On

July 28th, 2008  |  Published in REST, RPC, SOA, WS-*, design, distributed systems  |  Add to del.icio.us

Eric Newcomer gives us his take on my recent articles (1, 2, 3, 4) and blog postings (1, 2, 3, 4) about RPC, REST, and programming languages:

Anyway, after carefully reading the article and blog entries, I believe Steve is not against RPC per se. He wants people to think before just automatically using it because it’s convenient.

Exactly!

Also spot on are the following postings:

The beauty common to all these postings is the breadth, depth, and variety of thinking and reasoning they present. There’s a lot to read, but if you’re interested in critical thinking about the design and construction of distributed systems I encourage you to read them all the way through, including the comments, and to follow the links they offer as well.

Protocol Buffers: Leaky RPC

July 13th, 2008  |  Published in RPC, distributed systems, services  |  Add to del.icio.us

Mark Pilgrim tells us why Protocol Buffers are so nice. Notice, though, that everything he writes focuses entirely on their form and structure as messages. If you focus only on that perspective, then sure, they’re better than what many could come up with if they were rolling their own. In fact, if Google had stopped there, I think Protocol Buffers could be a superb little package.

But they didn’t stop there. No, they had to include the fundamentally flawed remote procedure call. Well, sort of, anyway:

The Protocol Buffer library does not include an RPC implementation. However, it includes all of the tools you need to hook up a generated service class to any arbitrary RPC implementation of your choice. You need only provide implementations of RpcChannel and RpcController.

Why ruin a perfectly good messaging format by throwing this RPC junk into the package? What if I want to send these messages via some other means, such as message queuing, for example? Do I have to pay for this RPC code if I don’t need it? If my messages don’t include service definitions, do I avoid all that RPC machinery?

In my previous post I talked about the message tunneling problem, where data that doesn’t fit the distributed type system are forced through the system by packing them into a type such as string or sequence of octets. Since Protocol Buffers require you to “hook up a generated service class to any arbitrary RPC implementation of your choice,” it’s likely that you’re going to run into this tunneling problem. For example, if you want to send this stuff over IIOP, you’re likely going to send the marshaled protobufs as Common Data Representation (CDR) sequences of octet. You’re thus unavoidably paying for marshaling twice: once at the protobuf level for the protobuf itself, and then again at the CDR level to marshal the sequence of octet containing the protobuf. Any worthwhile IIOP/CDR implementation will be very fast at marshaling sequences of octet, but still, overhead is overhead.

But there are other problems too. What about errors? If something goes wrong with the RPC call, how do I figure that out? The answer appears to be that you call the RpcController to see if there was a failure, and if so, call it again to get a string indicating what the failure was. A string? This implies that I not only have to write code to convert exceptions or status codes from the underlying RPC implementation into strings, but also write code to convert them back again into some form of exception, assuming my RPC-calling code wants to throw exceptions to indicate problems to the code that calls it.

What about idempotency? If something goes wrong, how do I know how far the call got? Did it fail before it ever got out of my process, or off my host? Did it make it to the remote host? Did it make it into the remote process, but failed before it reached the service implementation? Or did it fail sometime after the service processed it, as the response made its way back to me? If the call I’m making is not idempotent, and I want to try it again if I hit a failure, then I absolutely need to know this sort of information. Unfortunately, Protocol Buffers supplies nothing whatsoever to help with this problem, instead apparently punting to the underlying RPC implementation.

Still more problems: the RpcController offers methods for canceling remote calls. What if the underlying RPC package doesn’t support this? Over the years I’ve seen many that don’t. Note that this capability impacts the idempotency problem as well.

Another question: what about service references? As far as I can see, the protobuf language doesn’t support such things. How can one service return a message that contains a reference to another service? I suspect the answer is, once again, data tunneling — you would encode your service reference using a form supported by the underlying RPC implementation, and then pass that back as a string or sequence of bytes. For example, if you were using CORBA underneath, you might represent the other service using a stringified object reference and return that as a string. Weak.

All in all, the Protocol Buffers service abstraction is very leaky. It doesn’t give us exceptions or any ways of dealing with failure except a human-readable string. It doesn’t give us service references, so we have no way to let one service refer to another within a protobuf message. We are thus forced to work in our code simultaneously at both the Protocol Buffers level and also at the underlying RPC implementation level if we have any hope of dealing with these very-real-world issues.

My advice to Google, then, is to just drop all the service and RPC stuff. Seriously. It causes way more problems than it’s worth, it sends people down a fundamentally flawed distributed computing path, and it takes away from what is otherwise a nice message format and structure. If Google can’t or won’t drop it, then they should either remove focus from this aspect by relegating this stuff to an appendix in the documentation, or if they choose to keep it all at the current level of focus, then they should clarify all the questions of the sort I’ve raised here, potentially modifying their APIs to address the issues.

Protocol Buffers: No Big Deal

July 11th, 2008  |  Published in HTTP, RPC, distributed systems, services  |  Add to del.icio.us

I’ve gotten a few emails asking me to blog my opinion of Google’s Protocol Buffers. Well, I guess I pretty much share Stefan’s opinion. I certainly don’t see this stuff providing anything tremendously innovative, so as with Cisco Etch, it seems to me to be just another case of NIH.

Ted Neward already wrote a pretty thorough analysis — it’s almost 0.85 Yegges in length! — so I’ll just refer you to him. There are at least two important points he made that bear repeating, though:

Which, by the way, brings up another problem, the same one that plagues CORBA, COM/DCOM, WSDL-based services, and anything that relies on a shared definition file that is used for code-generation purposes, what I often call The Myth of the One True Schema.

Indeed. Usually when one points this out, those who disagree with you come back with, “Oh, that doesn’t matter — you can just send whatever data you want as a string or as an array of bytes!” Having been forced to do just that numerous times back in my CORBA days, I know that’s not a good answer. You have a bunch of distributed infrastructure trying to enforce your One True Schema Type System, yet you go the extra mile to tunnel other types that don’t fit that schema through it all, and the extra layers can end up being especially complicated and slow.

The second point I’ll quote from Ted says it all:

Don’t lose sight of the technical advantages or disadvantages of each of those solutions just because something has the Google name on it.

Most excellent advice.

There’s one last thing I’ll quote. This one’s not from Ted, but directly from the Protocol Buffers documentation:

For example, you might implement an RpcChannel which serializes the message and sends it to a server via HTTP.

Sigh.

[Update: more opinions — and some questions — in my next post.]

Convenience Over Correctness

July 1st, 2008  |  Published in REST, RPC, column, integration, messaging  |  Add to del.icio.us

My latest Internet Computing column, “Convenience Over Correctness,” (PDF) is now available. It continues the exploration of problems with RPC-oriented distributed programming approaches that I’ve been writing about in each of my three prior columns this year, as well as in columns from years gone by and in the erlang-questions mailing list.

For years we’ve known RPC and its descendants to be fundamentally flawed, yet many still willingly use the approach. Why? I believe the reason is simply convenience. Regardless of RPC’s well-understood problems, many developers continue to go down the RPC-oriented path because it conveniently fits the abstractions of the popular general-purpose programming languages they limit themselves to using. Making a function or method call to a remote or distributed function, object, or service appear just like any other function or method call allows such developers to stay within the comfortable confines of their language. Those who choose this approach essentially decide that developer convenience and comfort is more important than dealing with hard distribution issues like latency, concurrency, reliability, scalability, and partial failure.

Is this convenience for the developer the right thing to focus on? I really, really don’t think it is. There are ways of developing robust distributed applications that don’t require code-generation toolkits, piles of special code annotations, or brittle enterprisey frameworks. Perhaps the wonderful programming language renaissance we’re currently experiencing will help us to finally see the light and put tired old broken abstractions like RPC permanently out to pasture.