A number of folks planning to submit to our call for papers for “Programmatic Interfaces for Web Applications” requested an extension on the submission deadline, so we’ve decided to do just that, extending the deadline by a week from Nov. 1 to Nov. 8. We’re already expecting a pretty good number of submissions, and hopefully the extra week will allow for more submissions and submissions of extra high quality. If you have any questions, don’t hesitate to contact us.
My Nov./Dec. Internet Computing column is now available. It’s entitled “RESTful Web Services Development Checklist“ and as its name implies, it covers some of the primary areas developers need to focus on to write good RESTful web services. These areas are:
- Resources and their URIs
- Applications and Hypermedia
- Representations and Media Types
Regarding the “Applications and Hypermedia” area, I feel Roy Fielding’s pain that many efforts labeled as being RESTful seem to completely ignore the hypermedia constraint. I believe many developers tend to miss this constraint because they’re so used to using libraries and frameworks that offer lots of entry points, and having knowledge of those entry points in the client normally isn’t that bad since the client and library/framework are tightly coupled into the same address space anyway. In a distributed system, though, this definitely does not hold true; when the client knows a bunch of entry points into the service, it ties the client to that service and inhibits their independent evolution.
Anyway, please read the column and let me know what you think, and thanks again to Stefan Tilkov for his helpful review of the draft.
Coincidentally I also feel Roy’s pain when it comes to writing about REST. He states:
I don’t try to tell them exactly what to do because, quite frankly, I don’t have anywhere near enough knowledge of their specific context to make such a decision.
So, when you find it hard to understand what I have written, please don’t think of it as talking above your head or just too philosophical to be worth your time. I am writing this way because I think the subject deserves a particular form of precision. Instead, take the time to look up the terms. Think of it as an opportunity to learn something new, not because I said so, but because it will do you some personal good to better understand the depth of our field.
Obviously, Roy is the ultimate REST authority, given that he defined it, so I’m not at all claiming to be anywhere near as authoritative about it as he is, yet I’ve also experienced what he says above. For example, consider this informal review of my columns I received a few months ago in a comment on someone else’s blog:
The articles of yours that I’ve read are…amorphous to me. They speak in generalities. I haven’t read an article where you sit down and write the same service using both REST and RPC and compare the two. When you speak in generalities, we can’t objectively evaluate any of the specific trade-offs between approaches… Arguments that happen at too abstract a level can’t go anywhere, because our positions aren’t specific enough for anyone to evaluate anybody else’s claims.
In other words, “since your columns don’t do my thinking and experimentation for me, they’re useless to me.” Hmm. Maybe I’m just old school, but I’d much rather understand mathematics than require someone to hold my hand while I blindly punch buttons on a calculator. In other words, as the old proverb goes, I’d much rather try to teach you to fish so you can feed yourself. As I state in this new column:
Whether developers of RESTful HTTP-based services write their code in IDEs or with simple text editors, and regardless of which programming languages they use, they must understand REST and HTTP fundamentals to succeed.
If you’re writing RESTful web services, you need to be able to parse HTTP
Content-type headers. The
Accept header tells you what media types the client can handle, and
Content-type tells you what the client is sending when it invokes
Several years ago Joe Gregorio wrote a really nice article explaining how to properly handle such parsing, and he implemented his ideas in the form of the
mimeparse Python module. Later, others added Ruby and PHP versions.
In my Erlang web services work I’ve been doing just enough to parse my
Accept headers, knowing full well my approach had holes in it and would eventually need to be done right. When I recently stumbled on Joe’s article and code, I decided to port it to Erlang and use it. Joe’s code is very clean and has accompanying unit tests, so the port was pretty easy. However, in the process of plugging it into my system I found a reasonably common case where the
best_match function would return a wildcard match in preference to an exact match, so I added unit tests for that case and repaired the problem for all four languages.
I also found that Java’s
URLConnection class, or maybe it’s
HttpURLConnection, sends a plain “*” as a wildcard MIME type, which I don’t believe is legal (please correct me if I’m wrong), but since I figure there’s probably more than a few clients out there based on that code, I modified the four
mimeparse implementations to handle that as well, treating it as “*/*”.
All four implementations are available from the
mimeparse project home page.
Mark Pilgrim tells us why Protocol Buffers are so nice. Notice, though, that everything he writes focuses entirely on their form and structure as messages. If you focus only on that perspective, then sure, they’re better than what many could come up with if they were rolling their own. In fact, if Google had stopped there, I think Protocol Buffers could be a superb little package.
The Protocol Buffer library does not include an RPC implementation. However, it includes all of the tools you need to hook up a generated service class to any arbitrary RPC implementation of your choice. You need only provide implementations of
Why ruin a perfectly good messaging format by throwing this RPC junk into the package? What if I want to send these messages via some other means, such as message queuing, for example? Do I have to pay for this RPC code if I don’t need it? If my messages don’t include
service definitions, do I avoid all that RPC machinery?
In my previous post I talked about the message tunneling problem, where data that doesn’t fit the distributed type system are forced through the system by packing them into a type such as string or sequence of octets. Since Protocol Buffers require you to “hook up a generated service class to any arbitrary RPC implementation of your choice,” it’s likely that you’re going to run into this tunneling problem. For example, if you want to send this stuff over IIOP, you’re likely going to send the marshaled protobufs as Common Data Representation (CDR) sequences of octet. You’re thus unavoidably paying for marshaling twice: once at the protobuf level for the protobuf itself, and then again at the CDR level to marshal the sequence of octet containing the protobuf. Any worthwhile IIOP/CDR implementation will be very fast at marshaling sequences of octet, but still, overhead is overhead.
But there are other problems too. What about errors? If something goes wrong with the RPC call, how do I figure that out? The answer appears to be that you call the
RpcController to see if there was a failure, and if so, call it again to get a string indicating what the failure was. A string? This implies that I not only have to write code to convert exceptions or status codes from the underlying RPC implementation into strings, but also write code to convert them back again into some form of exception, assuming my RPC-calling code wants to throw exceptions to indicate problems to the code that calls it.
What about idempotency? If something goes wrong, how do I know how far the call got? Did it fail before it ever got out of my process, or off my host? Did it make it to the remote host? Did it make it into the remote process, but failed before it reached the service implementation? Or did it fail sometime after the service processed it, as the response made its way back to me? If the call I’m making is not idempotent, and I want to try it again if I hit a failure, then I absolutely need to know this sort of information. Unfortunately, Protocol Buffers supplies nothing whatsoever to help with this problem, instead apparently punting to the underlying RPC implementation.
Still more problems: the
RpcController offers methods for canceling remote calls. What if the underlying RPC package doesn’t support this? Over the years I’ve seen many that don’t. Note that this capability impacts the idempotency problem as well.
Another question: what about
service references? As far as I can see, the protobuf language doesn’t support such things. How can one service return a message that contains a reference to another service? I suspect the answer is, once again, data tunneling — you would encode your service reference using a form supported by the underlying RPC implementation, and then pass that back as a string or sequence of bytes. For example, if you were using CORBA underneath, you might represent the other service using a stringified object reference and return that as a string. Weak.
All in all, the Protocol Buffers
service abstraction is very leaky. It doesn’t give us exceptions or any ways of dealing with failure except a human-readable string. It doesn’t give us service references, so we have no way to let one service refer to another within a protobuf message. We are thus forced to work in our code simultaneously at both the Protocol Buffers level and also at the underlying RPC implementation level if we have any hope of dealing with these very-real-world issues.
My advice to Google, then, is to just drop all the
service and RPC stuff. Seriously. It causes way more problems than it’s worth, it sends people down a fundamentally flawed distributed computing path, and it takes away from what is otherwise a nice message format and structure. If Google can’t or won’t drop it, then they should either remove focus from this aspect by relegating this stuff to an appendix in the documentation, or if they choose to keep it all at the current level of focus, then they should clarify all the questions of the sort I’ve raised here, potentially modifying their APIs to address the issues.
I’ve gotten a few emails asking me to blog my opinion of Google’s Protocol Buffers. Well, I guess I pretty much share Stefan’s opinion. I certainly don’t see this stuff providing anything tremendously innovative, so as with Cisco Etch, it seems to me to be just another case of NIH.
Which, by the way, brings up another problem, the same one that plagues CORBA, COM/DCOM, WSDL-based services, and anything that relies on a shared definition file that is used for code-generation purposes, what I often call The Myth of the One True Schema.
Indeed. Usually when one points this out, those who disagree with you come back with, “Oh, that doesn’t matter — you can just send whatever data you want as a string or as an array of bytes!” Having been forced to do just that numerous times back in my CORBA days, I know that’s not a good answer. You have a bunch of distributed infrastructure trying to enforce your One True Schema Type System, yet you go the extra mile to tunnel other types that don’t fit that schema through it all, and the extra layers can end up being especially complicated and slow.
The second point I’ll quote from Ted says it all:
Don’t lose sight of the technical advantages or disadvantages of each of those solutions just because something has the Google name on it.
Most excellent advice.
There’s one last thing I’ll quote. This one’s not from Ted, but directly from the Protocol Buffers documentation:
For example, you might implement an
RpcChannelwhich serializes the message and sends it to a server via HTTP.
[Update: more opinions — and some questions — in my next post.]