IDLs vs. Human Documentation

January 16th, 2008  |  Published in code generation, CORBA, documentation, IDL, WSDL  |  13 Comments  |  Bookmark on Pinboard.in

Patrick Mueller responded to my previous blog posting on interface definition languages, and I wanted to comment on his response. Long ago Patrick was involved in defining the Smalltalk bindings for CORBA IDL, so he’s a CORBA veteran like me, and in the big picture we agree on many things. It’s nice having him cast a critical eye on this stuff.

Note that Patrick mostly talks about data schemas, whereas my posting talks only of interface definition languages. These are two very different things, which I’ve noted in comments on his blog. In a reply comment he said they’re both metadata, which is true, but still, they’re very separable. REST depends heavily on data definitions, but it doesn’t require specialized interface definitions because it promotes a uniform interface. For data definition REST relies on and promotes media/MIME types, and the standardization of such data definitions is critical to allowing independently-developed consumers and providers to interact correctly. I doubt Patrick and I really disagree on this last point.

One area where we apparently do disagree, though, is in the area of documentation. In my previous post I said that users of services ultimately rely on human-generated and human-readable documentation, not interface definition languages, to ensure their consuming applications interact correctly with those services. Patrick commented:

The documentation? What documentation? I’m picturing here that Steve has in mind a separate document (separate from the code, not generated from the code) written by a developer. But human generated documentation like this is still schema, only it’s not understandable by machines, pretty much guaranteed to get out of sync with the code, probably incomplete and/or imprecise. Not that machine generated schema might fare any better, but it couldn’t be any worse.

But there are more problems with this thought. The notion of hand-crafted documentation for an API is quaint, but impractical if I’m dealing with more than a handful of APIs.

I understand what Patrick’s saying here. Yes, documentation can get stale and out of sync. Still, I disagree. I’ve been near interface definition languages for at least 20 years now, and never once — not even once — have I seen anyone develop a consuming application without relying on some form of human-oriented documentation for the service being consumed. Such documentation might be as simple as a conversation with a developer across the hall, or reading comments in the definition language file itself, or might be from a README, email, a web page, a wiki, a Word document, a PDF, or a whole formal specification. I mean, what if the OMG had published only the ORB and Object Services IDL interfaces without the accompanying reams of human-oriented description and definition? Or if WSDL were enough, why the need for so many pages of human-oriented WS-* documentation?

Like I said in my previous post, interface definition languages exist for machines to generate code. They’re totally inadequate, though, for instructing developers on how to write code to use a service. The need for human documentation in this context isn’t quaint or impractical at all — it’s simply reality.

Responses

  1. Blog » Blog Archive » IDLs vs. Human Documentation says:

    January 16th, 2008 at 1:28 am (#)

    […] unknown wrote an interesting post today on IDLs vs. Human DocumentationHere’s a quick excerptPatrick Mueller responded to my previous blog posting on interface definition languages, and I wanted to comment on his response. Long ago Patrick was involved in defining the Smalltalk bindings for CORBA IDL, so he’s a CORBA veteran … […]

  2. Vinay says:

    January 16th, 2008 at 10:00 am (#)

    What would be your take on something equivalent of javadoc/doxygen. It makes maintaining the documentation much more easy compared to say a text file or word document that has completely separate identity.

  3. steve says:

    January 16th, 2008 at 11:40 am (#)

    Vinay: my own opinion of such approaches is that they work pretty well. I think the key to documenting a particular component, subsystem, or service is having a good overview comment to explain the big picture of what the thing does, along with having good details for each public operation — arguments going in, arguments coming out, return value, and relevant details of what successful outcomes mean as well as details about potential error conditions and what they mean. This is an inexpensive and simple solution that I’ve found works really well in practice, at least for the programming languages I’ve applied it to. I’ve never applied it to CORBA IDL but there’s no reason it couldn’t work well there, either.

  4. Patrick Mueller says:

    January 16th, 2008 at 11:41 am (#)

    I don’t really distinguish all that much from annotation-based meta-data, and IDL, though there’s obvious differences. From 10K feet, they both (can) provide the same sort information.

    So, you’ve just posed my question in a different manner; I’d expect to see the same response. There’s a bigger philosophic issue we need to settle on first, besides suggesting an alternative to IDL. :-)

  5. Patrick Mueller says:

    January 16th, 2008 at 12:13 pm (#)

    my response

  6. John D. Heintz says:

    January 16th, 2008 at 2:39 pm (#)

    Steve,

    I think I side with Patrick in general that there is something useful to be defined, but I’m not the least bit happy with the names right now.

    IDL: REST clearly promotes a single interface and trying to define more is bad karma.

    Data Schema: Media/MIME types do this very well.

    Hypermedia MIME types embed more than just data, but also the actions needed to do things via some forms mechanism.

    For example: a shopping agent wants to buy an item (somehow) already placed in the shopping cart.
    – How does this agent pick the “checkout” form (as opposed to the “logout” form)?
    – How does this same shopping agent pick out the “checkout” form on a different shopping site?
    – What happens when the site moves to an “a” link to get the checkout page, that then contains the “checkout” form?

    More importantly: What classes of server-side evolutionary changes will this same client be resilient to?

    Why care about these flexibilities? To maintain independent and evolutionary changes in the distributed system. This key goal of REST is being squished when moving from human to machine interactions. It is a growing long-term burden that RESTful services need to maintain _exactly_ what they have already documented or encoded into the Service-Python-1.0.zip files that the client use.

    The REST community already clearly believes that URIs should not be hard-coded, but rather discovered. That’s a good start, but what else?

    I submit that the following are appropriate for a REST-ful interaction to also support in an evolutionary manner:
    – additional data parameters (the server should be able add a hidden “requestID” parameter to a form for example)
    – changes to the HTTP verb (POST to PUT)
    – change to multi-step processing (server moves from single POST to reliable POST-then-PUT)

    What definitions and interactions would be defined, shared, and executed at runtime to support that scenario?

    REST and HTTP already define a single interface, but the free variable is MIME types (especially hypermedia ones). I predict that over the next 10 years the most important progress will be how to build and share good extensible hypermedia documents and types.

    Some details to back this up:
    – XSD is being worked on for “extensible versioning”
    – URI Templates, WADL, Web Forms, HTML5 all seem to be building blocks that just could snap together the right way
    – HTML microformats are growing rapidly, but don’t provide a namespaced partial typing and interactions definition
    – This quote from the REST thesis, section 5.3.3 The Data View:
    “The model application is therefore an engine that moves from one state to the next by examining and choosing from among the alternative state transitions in the current set of representations.”

    A RESTful client application should be able to rely on some interaction engine to make application coding easier. (Not code-generation! That’s certainly not what I mean.)

    The shopping client agent, given it’s own data and the current representation state, decides “now I should checkout”. From there the “engine” can use _something_ to follow the right links and trigger the right transfer actions.

    John (too much rambling…) Heintz

  7. Markus Pohlmann says:

    January 17th, 2008 at 7:31 am (#)

    Hi Steve,

    I have done a huge integration project in 2001 in a finance industries DP center with CORBA.

    IDL was our project esperanto. It was the first time that Javarians and Cobolians used the same language to talk to each other. The social effects were wunderful.

    From my point of view is the usage of a (technical) human readable Interface Language a key factor for large scale software integration.
    It’s not only software which have to rely on each other, but also humans who have to agree on an interface.

    Regards

    Markus

  8. steve says:

    January 17th, 2008 at 9:19 am (#)

    Patrick and John: I don’t necessarily disagree. I guess the original question I was writing about was, “Does REST require an interface definition language?” And the answer to that is a resounding “no.” The next question, then, is, “Can some RESTful applications benefit from an interface definition language?” I think you both give good reasons why the answer to this question is “yes.”

    But here’s my fear: introducing the wrong interface definition language takes us back to that code generation place again, and people start wanting magical frameworks that defy the laws of distributed systems by erroneously turning their Java or .NET classes into web resources by attaching a few annotations here and there, and they want to do things backward by generating their interface definitions from their code. That is a Bad Place and we don’t want to go there. I believe this is what keeps the REST community from treading down the interface definition path very often; Marc and the WADL folks have been taking it very very slowly, for example.

  9. steve says:

    January 17th, 2008 at 9:58 am (#)

    Markus: I have no doubt that what you say is true. I also have no doubt that you also relied on human-oriented documentation that went beyond what your IDL provided in order to more adequately inform your developers of the semantics and side effects of calling your CORBA services.

    But the fact remains that there’s no need for an IDL-like language for REST, as the REST interface is uniform and is already fully defined. If you were going to create an analogous language for REST, then as John explains above, you’d want the language to help define interaction details based firmly on RESTful foundations, rather than focusing on object interfaces, operations, parameters, and inheritance relationships as OMG IDL does.

  10. John D. Heintz says:

    January 17th, 2008 at 4:09 pm (#)

    Steve: I agree with how you put it. REST systems don’t require more than a uniform interface and MIME types, but they could sometimes benefit from processable definitions.

    I also agree with your fear of code generation, magical annotations that distribute systems, and ignoring lessons already learned.

    Part of what I think is the problem: the very name “interface definition”. RESTful systems only have one interface, so entertaining creating new ones is at the least confusing.

    Also, the most interesting and intricate parts are left out of most IDLs (CORBA, WSDL, Java/C# interfaces, …): states, transitions, pre/post conditions, and everything else that is maybe described in a blurb of text.

    My suggestion for what we should be talking about is a “state definition” language. REST is “State Transfer” with a uniform interface. Providing an extensible, composable, shared semantic between providers and consumers to document high level states and resources makes sense to me.

    The tooling support that an “SDL” could provide is to abstract away some of the mechanics of HTTP interactions, but not the human coded logic. For example: a shopping agent needs some specific logic to decide when to “checkout” and buy things. That is the stuff developers need to think about and write. Once that decision is made finding the right URI, verb, and content can be assisted by an engine looking for meta-data corresponding to that shared SDL.

  11. Patrick Mueller says:

    January 18th, 2008 at 1:24 am (#)

    I also don’t think it meta-data should be REQUIRED, but that if there, can provide a benefit. And I realize there’s a cost (in most cases) – keeping the code and meta-data in sync. Which is not insignificant, though testing can mitigate this.

    That’s a good first step – agreement on “not required” – I have arguments with people all the time who claim that meta-data is bad – inherently bad – and their rationale, oddly enough: “because that’s WS-* all over again”. Putting me in an odd position of having to say something positive about WS-* :-)

    I also agree there are some bad, bad things that can happen with reverse engineering the meta-data from an implementation, generating magical wrappers, etc. That’s actually a shame, because, if you know what you’re doing, it’s obviously quite possible to infer metadata from an implementation (that you’ve written), and generate useful wrappers (for you to use). The problem is you have to know what you’re doing, and most people don’t.

  12. To machine-document or not machine-document … | sun says:

    January 18th, 2008 at 1:38 am (#)

    […] off without an interface definition language. He is especially picking up on teve Vinoski’s IDLs vs. Human Documentation post, which emphasizes human readable documentation over […]

  13. Peter Williams - RESTful Service Discovery and Description says:

    January 22nd, 2008 at 4:00 pm (#)

    […] There has been a great deal of discussion regarding RESTful web service description languages this week. The debate is great for the community but I think Steve Vinoski has it basically right […]