CORBA

Defending Something Other Than RPC

May 24th, 2008  |  Published in CORBA, distributed systems, HTTP, messaging, objects, RPC  |  Bookmark on Pinboard.in

Josh Haberman takes me to task for my previous posting:

Steve Vinoski has come out very vocally against RPC in the last few days…

Actually, I’ve been saying similar things for years now, Josh, not just the last few days. For example, I noted problems with RPC in my Mar/Apr 2008 IEEE Internet Computing column entitled “Demystifying RESTful Data Coupling.” I noted problems with RPC in my Sep/Oct 2005 column entitled “RPC Under Fire.” I noted problems with RPC in my Jul/Aug 2002 column entitled “Web Services Interaction Models, Part 2: Putting the ‘Web’ Into Web Services.”

His blog entry basically makes fun of Cisco for inventing/releasing another RPC system. It’s not clear exactly what he thinks they should have done instead.

I think my posting pretty clearly implies that Cisco should have avoided writing their own and instead should have reused something that already exists.

What is strange about this criticism is that tons of technology companies have developed their own RPC system — Facebook and Cisco publicly, and other technology companies I am familiar with in a not-so-public way. Guess what: large commercial distributed systems are built largely on RPC. Is he arguing that all of the engineers at these companies simultaneously got the bad idea of investing in something they don’t need? If RPC is such a bad idea, then why is everybody doing it?

Is everybody really doing it? Are large commercial distributed systems really built largely on RPC? I’ve seen some non-trivial CORBA-based deployments over the years, but in my experience large systems are built using approaches other than RPC. Like the Web, which isn’t RPC. Like email, which isn’t RPC. Like pub/sub enterprise messaging systems, which aren’t RPC.

Let’s consider the definition of what an RPC actually is. The term is often misused to mean “a synchronous call to another system over the network.” This is not what an RPC is. For example, an HTTP request is synchronous, but it is not an RPC. RPC, rather, is a specific approach for developing networked applications where local calls wrap and hide operations that happen to be carried out on another system across the network. For starters, let’s check Wikipedia:

Remote procedure call (RPC) is a technology that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction. That is, the programmer would write essentially the same code whether the subroutine is local to the executing program, or remote.

Next, let’s check RFC 707, where RPC comes from, in which James E. White specifically proposed a procedure call model for networked applications designed to hide the network, and thereby allow developers to use familiar approaches to developing applications that happened to perform network operations. Quoting from that RFC:

Ideally, the goal of both the Protocol and its accompanying RTE is to make remote resources as easy to use as local ones. Since local resources usually take the form of resident and/or library subroutines, the possibility of modeling remote commands as “procedures” immediately suggests itself. The Model is further confirmed by the similarity that exists between local procedures and the remote commands to which the Protocol provides access. Both carry out arbitrarily complex, named operations on behalf of the requesting program (the caller); are governed by arguments supplied by the caller; and return to it results that reflect the outcome of the operation. The procedure call model thus acknowledges that, in a network environment, programs must sometimes call subroutines in machines other than their own.

and also:

“The procedure call model would elevate the task of creating applications protocols to that of defining procedures and their calling sequences. It would also provide the foundation for a true distributed programming system (DPS) that encourages and facilitates the work of the applications programmer by gracefully extending the local programming environment, via the RTE, to embrace modules on other machines.” This integration of local and network programming environments can even be carried as far as modifying compilers to provide minor variants of their normal procedure-calling constructs for addressing remote procedures (for which calls to the appropriate RTE primitives would be dropped out).

Josh continues:

Yes, on a network sh*t happens, and no sane RPC system will try to hide this from you.

As you can see from the original definition of RPC, something called an RPC that doesn’t hide the network is, by definition, not an RPC. As I said above, unfortunately the term is often misused as meaning “synchronous messaging,” and that incorrect usage seems to be what Josh is defending. Josh then says:

But then again, I don’t know of any RPC system that tries to hide this from you except possibly CORBA.

That’s not correct either. What CORBA actually does is make everything appear remote, even local objects, but does so in a way that allows object request broker (ORB) implementations to bypass much of the overhead of remote invocations when the ORB knows that a target object is local. Still, not all the overhead can be eliminated due to object lifecycle and method dispatching requirements, meaning that such local calls are typically never as fast as true local calls. DCE also treats services as always being remote, but last I checked it included no local bypass optimizations (though a variant called OODCE once did this, IIRC). But either way, what’s important with these systems is that calls within your code look just like any other calls within your code, whether they’re calling remote operations or not. And that’s RPC.

Regarding versioning problems, Josh says:

But any RPC framework worth its salt makes it possible to have different interface versions interoperate. Adding a new parameter? No problem, old servers simply won’t see it. Completely changing the semantics of your call? No problem — just give the new call a new name.

Yes, Josh, there are generally ways to do versioning in such systems, but they’re not very good. CORBA includes some facilities to help with versioning, but in practice they don’t actually help that much. Both COM and CORBA promoted interface inheritance and runtime interface negotiation (called “narrowing” in CORBA) as a way to do versioning, which works, but only for a restricted set of changes. Add a parameter to an existing call? Sorry, no can do, unless your marshaling format carries complete information for the entire call including parameter names, types, and positions, and also versions each parameter, all of which systems like CORBA, DCOM, and DCE specifically do not do due to the large overhead it adds whether a given application uses it or not, and in CORBA’s case also because of the interference it can cause for local dispatching optimizations. All in all, versioning is hard, not only for RPC, but for distributed systems in general.

Middleware and distributed systems veterans are well aware of the arguments like the ones I’ve made in my blog and other places recently and in various publications over the years; such arguments are generally common knowledge among us, and have been for years.

Cisco’s system is not available yet, but when it comes out, I’m quite certain you’ll find, Josh, that it’s the same old thing, just repackaged in a new box.

Father of CORBA – Not

April 21st, 2008  |  Published in CORBA, distributed systems  |  Bookmark on Pinboard.in

I see that at least one person on the planet believes I’m the “Father of CORBA.” I can certainly understand why people would think that, but I can definitely say that it’s inaccurate.

In 1989 Hewlett-Packard bought Apollo Computer, where I worked as a diagnostics engineer. Five years prior to that I started my career at Texas Instruments as an electrical engineer working on integrated circuits, and ended up having to write a lot of testing software despite having no software training whatsoever, except for a freshman class in BASIC required for all engineering majors. I found I really liked software, though, so I joined Apollo because the job there was half hardware and half software. By late 1990, though, Hewlett-Packard politics had just about killed the group I worked in, and my manager told me I’d be smart to find myself something else to work on before I was forced to do so. I had been developing a hardware debugger in C++ that involved distributed computing, so I looked around the former Apollo site for any groups using C++ in distributed systems. Turned out Jim Waldo was leading such a group — they were building the first Object Request Broker (ORB). I joined them in January 1991, which was 6 months before the first version of the CORBA spec was published. Ken Arnold was also part of that group. I joined just as they wrapped up the first ORB prototype.

One could argue, therefore, that Jim Waldo is the Father of CORBA, since he led the development of the first ORB. Alternatively, one could argue that Joe Sventek is the Father of CORBA, since he was the editor of the first CORBA specification (Joe was also at HP, though he worked in a different group located in California). However, while they both played important roles in initially defining CORBA, there is to the best of my knowledge no single person who can be called the Father of CORBA. Rather, it was definitely a “design by committee” effort, and I certainly don’t count as one of the fathers since at that time I wasn’t even part of the committee.

BTW, here’s a bit of trivia: for those of you who remember Cliff Stoll‘s “Stalking the Wily Hacker” and “Cuckoo’s Egg” publications, documenting his successful effort to identify who was hacking into computer systems at Lawrence Berkeley Laboratory, Joe Sventek’s account was one that the hacker used to gain access. Joe explained to me that he had been away from the laboratory for quite awhile, residing in the U.K. I believe, so red flags went up when there was activity in his account.

InfoQ Interview

February 26th, 2008  |  Published in commentary, conferences, CORBA, distributed systems, dynamic languages, erlang, HTTP, interview, productivity, REST  |  Bookmark on Pinboard.in

When I spoke at QCon San Francisco last November, Stefan Tilkov interviewed me, and the video is now available on InfoQ.com.

We covered a range of topics: CORBA, dynamic languages, REST, distribution, concurrency, Erlang. Stefan asked some great questions, and I hope I gave some worthwhile answers. Thanks again, Stefan.

More REST and IDL

January 20th, 2008  |  Published in CORBA, HTTP, IDL, REST, services, WSDL  |  Bookmark on Pinboard.in

Regarding the REST and IDL discussion, Joe Gregorio already wrote an excellent explanation six months ago. Perhaps the rest of us should have just linked to it to begin with and avoided wasting our time rehashing it all.

But then again, rehashing is fun! :-) On the same topic, I agree with much of what Dare said, except for this:

When building services with WS-*, you have a WSDL to describe your methods & expected inputs/outputs and XSD schema(s) to describe the schemas for said inputs/outputs. When building a RESTful Web Service, the need for both of these documents does not go away regardless of how often you repeat the phrase “uniform interface”.

I still disagree. The HTTP verb set defines a RESTful uniform interface. When comparing CORBA IDL or WSDL to HTTP web services, that verb set must be considered because it’s the only thing that you can directly relate to IDL and WSDL interfaces. Otherwise, you’re talking apples and oranges. The uniform HTTP interface is the only way for applications to interact with web resources in the same sense that CORBA applications interact with CORBA objects, and WSDL applications interact with WSDL services.

Much of the typical IDL and WSDL definition is devoted to defining one or more specialized interfaces consisting of specialized methods/operations and specialized data types to pass across them. For web resources that use the uniform interface, however, there’s simply no need whatsoever to define interfaces like that. What are you going to do, define GET again and again for each of your resources, each time with exactly the same semantics already specified in RFC 2616? Obviously not.

It’s easy to see that Joe’s OpenSearch document bears no semantic resemblance to a CORBA IDL or WSDL definition. Similarly, an AtomPub service document is nothing like IDL or WSDL either. Both of those documents essentially inform you of service URIs and media types, but they don’t define methods or operations. They don’t have to, because of the HTTP uniform interface. That’s important, and it’s why I disagree with Dare’s point quoted above.

Consider the mindset of the CORBA and WS-* developer. They use systems that force them to always think about all their service endpoints in terms of specialized interfaces. Given how much time I’ve spent in that world (and don’t forget, I still use CORBA too), I know for certain that the concept of the uniform interface is one of the big items that trips these developers up when they try to figure out what REST is about. It’s therefore important to keep the uniform interface as part of the conversation.

IDLs vs. Human Documentation

January 16th, 2008  |  Published in code generation, CORBA, documentation, IDL, WSDL  |  Bookmark on Pinboard.in

Patrick Mueller responded to my previous blog posting on interface definition languages, and I wanted to comment on his response. Long ago Patrick was involved in defining the Smalltalk bindings for CORBA IDL, so he’s a CORBA veteran like me, and in the big picture we agree on many things. It’s nice having him cast a critical eye on this stuff.

Note that Patrick mostly talks about data schemas, whereas my posting talks only of interface definition languages. These are two very different things, which I’ve noted in comments on his blog. In a reply comment he said they’re both metadata, which is true, but still, they’re very separable. REST depends heavily on data definitions, but it doesn’t require specialized interface definitions because it promotes a uniform interface. For data definition REST relies on and promotes media/MIME types, and the standardization of such data definitions is critical to allowing independently-developed consumers and providers to interact correctly. I doubt Patrick and I really disagree on this last point.

One area where we apparently do disagree, though, is in the area of documentation. In my previous post I said that users of services ultimately rely on human-generated and human-readable documentation, not interface definition languages, to ensure their consuming applications interact correctly with those services. Patrick commented:

The documentation? What documentation? I’m picturing here that Steve has in mind a separate document (separate from the code, not generated from the code) written by a developer. But human generated documentation like this is still schema, only it’s not understandable by machines, pretty much guaranteed to get out of sync with the code, probably incomplete and/or imprecise. Not that machine generated schema might fare any better, but it couldn’t be any worse.

But there are more problems with this thought. The notion of hand-crafted documentation for an API is quaint, but impractical if I’m dealing with more than a handful of APIs.

I understand what Patrick’s saying here. Yes, documentation can get stale and out of sync. Still, I disagree. I’ve been near interface definition languages for at least 20 years now, and never once — not even once — have I seen anyone develop a consuming application without relying on some form of human-oriented documentation for the service being consumed. Such documentation might be as simple as a conversation with a developer across the hall, or reading comments in the definition language file itself, or might be from a README, email, a web page, a wiki, a Word document, a PDF, or a whole formal specification. I mean, what if the OMG had published only the ORB and Object Services IDL interfaces without the accompanying reams of human-oriented description and definition? Or if WSDL were enough, why the need for so many pages of human-oriented WS-* documentation?

Like I said in my previous post, interface definition languages exist for machines to generate code. They’re totally inadequate, though, for instructing developers on how to write code to use a service. The need for human documentation in this context isn’t quaint or impractical at all — it’s simply reality.