distributed systems :: Steve Vinoski's Blog

distributed systems

Protocol Buffers: Leaky RPC

July 13th, 2008 | Published in distributed systems, RPC, services | Bookmark on Pinboard.in

Mark Pilgrim tells us why Protocol Buffers are so nice. Notice, though, that everything he writes focuses entirely on their form and structure as messages. If you focus only on that perspective, then sure, they’re better than what many could come up with if they were rolling their own. In fact, if Google had stopped there, I think Protocol Buffers could be a superb little package.

But they didn’t stop there. No, they had to include the fundamentally flawed remote procedure call. Well, sort of, anyway:

The Protocol Buffer library does not include an RPC implementation. However, it includes all of the tools you need to hook up a generated service class to any arbitrary RPC implementation of your choice. You need only provide implementations of RpcChannel and RpcController.

Why ruin a perfectly good messaging format by throwing this RPC junk into the package? What if I want to send these messages via some other means, such as message queuing, for example? Do I have to pay for this RPC code if I don’t need it? If my messages don’t include service definitions, do I avoid all that RPC machinery?

In my previous post I talked about the message tunneling problem, where data that doesn’t fit the distributed type system are forced through the system by packing them into a type such as string or sequence of octets. Since Protocol Buffers require you to “hook up a generated service class to any arbitrary RPC implementation of your choice,” it’s likely that you’re going to run into this tunneling problem. For example, if you want to send this stuff over IIOP, you’re likely going to send the marshaled protobufs as Common Data Representation (CDR) sequences of octet. You’re thus unavoidably paying for marshaling twice: once at the protobuf level for the protobuf itself, and then again at the CDR level to marshal the sequence of octet containing the protobuf. Any worthwhile IIOP/CDR implementation will be very fast at marshaling sequences of octet, but still, overhead is overhead.

But there are other problems too. What about errors? If something goes wrong with the RPC call, how do I figure that out? The answer appears to be that you call the RpcController to see if there was a failure, and if so, call it again to get a string indicating what the failure was. A string? This implies that I not only have to write code to convert exceptions or status codes from the underlying RPC implementation into strings, but also write code to convert them back again into some form of exception, assuming my RPC-calling code wants to throw exceptions to indicate problems to the code that calls it.

What about idempotency? If something goes wrong, how do I know how far the call got? Did it fail before it ever got out of my process, or off my host? Did it make it to the remote host? Did it make it into the remote process, but failed before it reached the service implementation? Or did it fail sometime after the service processed it, as the response made its way back to me? If the call I’m making is not idempotent, and I want to try it again if I hit a failure, then I absolutely need to know this sort of information. Unfortunately, Protocol Buffers supplies nothing whatsoever to help with this problem, instead apparently punting to the underlying RPC implementation.

Still more problems: the RpcController offers methods for canceling remote calls. What if the underlying RPC package doesn’t support this? Over the years I’ve seen many that don’t. Note that this capability impacts the idempotency problem as well.

Another question: what about service references? As far as I can see, the protobuf language doesn’t support such things. How can one service return a message that contains a reference to another service? I suspect the answer is, once again, data tunneling — you would encode your service reference using a form supported by the underlying RPC implementation, and then pass that back as a string or sequence of bytes. For example, if you were using CORBA underneath, you might represent the other service using a stringified object reference and return that as a string. Weak.

All in all, the Protocol Buffers service abstraction is very leaky. It doesn’t give us exceptions or any ways of dealing with failure except a human-readable string. It doesn’t give us service references, so we have no way to let one service refer to another within a protobuf message. We are thus forced to work in our code simultaneously at both the Protocol Buffers level and also at the underlying RPC implementation level if we have any hope of dealing with these very-real-world issues.

My advice to Google, then, is to just drop all the service and RPC stuff. Seriously. It causes way more problems than it’s worth, it sends people down a fundamentally flawed distributed computing path, and it takes away from what is otherwise a nice message format and structure. If Google can’t or won’t drop it, then they should either remove focus from this aspect by relegating this stuff to an appendix in the documentation, or if they choose to keep it all at the current level of focus, then they should clarify all the questions of the sort I’ve raised here, potentially modifying their APIs to address the issues.

Protocol Buffers: No Big Deal

July 11th, 2008 | Published in distributed systems, HTTP, RPC, services | Bookmark on Pinboard.in

I’ve gotten a few emails asking me to blog my opinion of Google’s Protocol Buffers. Well, I guess I pretty much share Stefan’s opinion. I certainly don’t see this stuff providing anything tremendously innovative, so as with Cisco Etch, it seems to me to be just another case of NIH.

Ted Neward already wrote a pretty thorough analysis — it’s almost 0.85 Yegges in length! — so I’ll just refer you to him. There are at least two important points he made that bear repeating, though:

Which, by the way, brings up another problem, the same one that plagues CORBA, COM/DCOM, WSDL-based services, and anything that relies on a shared definition file that is used for code-generation purposes, what I often call The Myth of the One True Schema.

Indeed. Usually when one points this out, those who disagree with you come back with, “Oh, that doesn’t matter — you can just send whatever data you want as a string or as an array of bytes!” Having been forced to do just that numerous times back in my CORBA days, I know that’s not a good answer. You have a bunch of distributed infrastructure trying to enforce your One True Schema Type System, yet you go the extra mile to tunnel other types that don’t fit that schema through it all, and the extra layers can end up being especially complicated and slow.

The second point I’ll quote from Ted says it all:

Don’t lose sight of the technical advantages or disadvantages of each of those solutions just because something has the Google name on it.

Most excellent advice.

There’s one last thing I’ll quote. This one’s not from Ted, but directly from the Protocol Buffers documentation:

For example, you might implement an RpcChannel which serializes the message and sends it to a server via HTTP.

Sigh.

[Update: more opinions — and some questions — in my next post.]

Joe Armstrong, Erlang, and RPC

May 27th, 2008 | Published in distributed systems, erlang, languages, RPC | Bookmark on Pinboard.in

Joe Armstrong explains the background of the distributed computing capabilities within Erlang.

I find postings like Joe’s highly valuable. Few among us are language designers, so rarely do those of us who aren’t get a first-hand account from someone who is as to why and how something within a given language’s design came to be. Joe describes the distribution primitives that Erlang provides as well as their composability; it might seem simple, but anyone who’s written non-trivial distributed computing infrastructure knows that choosing the right primitives and making the right design trade-offs is anything but simple. This explains why I continue to be so impressed with the design choices and trade-offs Joe and crew made for Erlang — I’ve simply never seen any distributed computing infrastructure so elegant and yet so practical and capable.

Defending Something Other Than RPC

May 24th, 2008 | Published in CORBA, distributed systems, HTTP, messaging, objects, RPC | Bookmark on Pinboard.in

Josh Haberman takes me to task for my previous posting:

Steve Vinoski has come out very vocally against RPC in the last few days…

Actually, I’ve been saying similar things for years now, Josh, not just the last few days. For example, I noted problems with RPC in my Mar/Apr 2008 IEEE Internet Computing column entitled “Demystifying RESTful Data Coupling.” I noted problems with RPC in my Sep/Oct 2005 column entitled “RPC Under Fire.” I noted problems with RPC in my Jul/Aug 2002 column entitled “Web Services Interaction Models, Part 2: Putting the ‘Web’ Into Web Services.”

His blog entry basically makes fun of Cisco for inventing/releasing another RPC system. It’s not clear exactly what he thinks they should have done instead.

I think my posting pretty clearly implies that Cisco should have avoided writing their own and instead should have reused something that already exists.

What is strange about this criticism is that tons of technology companies have developed their own RPC system — Facebook and Cisco publicly, and other technology companies I am familiar with in a not-so-public way. Guess what: large commercial distributed systems are built largely on RPC. Is he arguing that all of the engineers at these companies simultaneously got the bad idea of investing in something they don’t need? If RPC is such a bad idea, then why is everybody doing it?

Is everybody really doing it? Are large commercial distributed systems really built largely on RPC? I’ve seen some non-trivial CORBA-based deployments over the years, but in my experience large systems are built using approaches other than RPC. Like the Web, which isn’t RPC. Like email, which isn’t RPC. Like pub/sub enterprise messaging systems, which aren’t RPC.

Let’s consider the definition of what an RPC actually is. The term is often misused to mean “a synchronous call to another system over the network.” This is not what an RPC is. For example, an HTTP request is synchronous, but it is not an RPC. RPC, rather, is a specific approach for developing networked applications where local calls wrap and hide operations that happen to be carried out on another system across the network. For starters, let’s check Wikipedia:

Remote procedure call (RPC) is a technology that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction. That is, the programmer would write essentially the same code whether the subroutine is local to the executing program, or remote.

Next, let’s check RFC 707, where RPC comes from, in which James E. White specifically proposed a procedure call model for networked applications designed to hide the network, and thereby allow developers to use familiar approaches to developing applications that happened to perform network operations. Quoting from that RFC:

Ideally, the goal of both the Protocol and its accompanying RTE is to make remote resources as easy to use as local ones. Since local resources usually take the form of resident and/or library subroutines, the possibility of modeling remote commands as “procedures” immediately suggests itself. The Model is further confirmed by the similarity that exists between local procedures and the remote commands to which the Protocol provides access. Both carry out arbitrarily complex, named operations on behalf of the requesting program (the caller); are governed by arguments supplied by the caller; and return to it results that reflect the outcome of the operation. The procedure call model thus acknowledges that, in a network environment, programs must sometimes call subroutines in machines other than their own.

and also:

“The procedure call model would elevate the task of creating applications protocols to that of defining procedures and their calling sequences. It would also provide the foundation for a true distributed programming system (DPS) that encourages and facilitates the work of the applications programmer by gracefully extending the local programming environment, via the RTE, to embrace modules on other machines.” This integration of local and network programming environments can even be carried as far as modifying compilers to provide minor variants of their normal procedure-calling constructs for addressing remote procedures (for which calls to the appropriate RTE primitives would be dropped out).

Josh continues:

Yes, on a network sh*t happens, and no sane RPC system will try to hide this from you.

As you can see from the original definition of RPC, something called an RPC that doesn’t hide the network is, by definition, not an RPC. As I said above, unfortunately the term is often misused as meaning “synchronous messaging,” and that incorrect usage seems to be what Josh is defending. Josh then says:

But then again, I don’t know of any RPC system that tries to hide this from you except possibly CORBA.

That’s not correct either. What CORBA actually does is make everything appear remote, even local objects, but does so in a way that allows object request broker (ORB) implementations to bypass much of the overhead of remote invocations when the ORB knows that a target object is local. Still, not all the overhead can be eliminated due to object lifecycle and method dispatching requirements, meaning that such local calls are typically never as fast as true local calls. DCE also treats services as always being remote, but last I checked it included no local bypass optimizations (though a variant called OODCE once did this, IIRC). But either way, what’s important with these systems is that calls within your code look just like any other calls within your code, whether they’re calling remote operations or not. And that’s RPC.

Regarding versioning problems, Josh says:

But any RPC framework worth its salt makes it possible to have different interface versions interoperate. Adding a new parameter? No problem, old servers simply won’t see it. Completely changing the semantics of your call? No problem — just give the new call a new name.

Yes, Josh, there are generally ways to do versioning in such systems, but they’re not very good. CORBA includes some facilities to help with versioning, but in practice they don’t actually help that much. Both COM and CORBA promoted interface inheritance and runtime interface negotiation (called “narrowing” in CORBA) as a way to do versioning, which works, but only for a restricted set of changes. Add a parameter to an existing call? Sorry, no can do, unless your marshaling format carries complete information for the entire call including parameter names, types, and positions, and also versions each parameter, all of which systems like CORBA, DCOM, and DCE specifically do not do due to the large overhead it adds whether a given application uses it or not, and in CORBA’s case also because of the interference it can cause for local dispatching optimizations. All in all, versioning is hard, not only for RPC, but for distributed systems in general.

Middleware and distributed systems veterans are well aware of the arguments like the ones I’ve made in my blog and other places recently and in various publications over the years; such arguments are generally common knowledge among us, and have been for years.

Cisco’s system is not available yet, but when it comes out, I’m quite certain you’ll find, Josh, that it’s the same old thing, just repackaged in a new box.

Just What We Need: Another RPC Package

May 22nd, 2008 | Published in commentary, distributed systems, IDL, integration, RPC, SOAP, WSDL | Bookmark on Pinboard.in

I see from this CIO Magazine article that Cisco is releasing a new client/server messaging system called Etch. Sigh — those who don’t know history are indeed doomed to repeat it. Some choice quotes from the article:

This week Cisco Systems announced a new messaging protocol intended to allow developers to integrate client/server applications without the overhead of traditional protocols such as SOAP.

I was unaware that SOAP had become “traditional.”

One of its design goals was to create an inter-application communications technology without SOAP’s complexity and overhead, explained Marascio. While SOAP relies on a very complicated WSDL file to define the interface between the client and server, Etch uses a file in Cisco’s own interface definition language that shares many similarities to a Java interface file.

I bet this new IDL is not only simpler than WSDL, but it probably also avoids all the impedance mismatch problems that invariably occur when mapping IDL to programming languages.

In addition to a simplified configuration, Etch also promises less overhead over the wire, compared to SOAP. In a testbed environment where SOAP was managing around 900 calls a second, Etch generated more than 50,000 messages in a one-way mode, and 15,000 transactions with a full round-trip, company officials stated.

Oh good, the “performance presumption.” So now we’re back to where we were a decade ago, at least as far as message transfer rates go. I wonder if Etch also solves the problem that the bottlenecks usually lie elsewhere?

The Etch integration into Visual Studio and Eclipse will be very familiar to anyone who has used SOAP integration tools. After authoring the IDL definition, the developer tells the IDE to generate either a client stub or a server skeleton. The client stub is usable immediately; the developer needs only to configure the transport and endpoint, and to code the message calls.

On the server, the developer takes the skeleton and implements the business logic that lives inside the message handlers.

Now that’s what I call innovation!

Projects implementing their communications using Etch aren’t out of luck if they need to interoperate with SOAP, JSON, REST or other existing protocols. Cisco has already demonstrated the capability to easily create bridges between Etch and SOAP, according to Marascio. He said that turnkey bridges to SOAP and REST should be available six to nine months after the release of Etch.

Or, to put it another way: Etch is really just adding more stuff to be developed, tested, deployed, managed, maintained, and integrated, yet it doesn’t actually solve any new problems or solve any old problems better than what already exists.

Cisco also is examining the possibility of establishing Etch as a standard. Marascio pointed out that Cisco is well represented in the IETF, the main standards body for Internet protocols. Alternatively, Cisco might attempt to promote Etch as an industry standard, an effort that would be aided by Etch’s open source nature.

Well of course you want to standardize it — where would any new NIH RPC protocol be without an accompanying standards effort? Rather than the IETF, though, perhaps you ought to get those ISO OOXML guys to rubber-stamp it?

I find it hard to believe that in 2008 people are still inventing stuff like this. Sheesh. Color me underwhelmed.

Steve Vinoski's Blog