HTTP

New Column on Webmachine

March 15th, 2010  |  Published in column, erlang, functional programming, HTTP, REST, web  |  Bookmark on Pinboard.in

“The Functional Web” column is finally back, this time with a column about Webmachine co-authored with Justin Sheehy. The column title is Developing RESTful Web Services with Webmachine, and you can follow that link to retrieve the PDF.

Webmachine is a highly innovative web application framework, and it can teach you a great deal about the specifics of HTTP and the details of REST. It’s also written in Erlang, which continues to be my favorite programming language of all time because of its incredible practicality, utility, and elegance.

My column hiatus was due to extreme startup workload, which for better or worse is showing no sign of letting up anytime soon. But it’s nice to get the column back on track for the March/April Internet Computing issue, and one of my goals is to avoid missing any more issues this year. Many thanks to Justin for his contribution to this issue of the column.

Yaws 1.85 Released

October 19th, 2009  |  Published in HTTP, web, yaws  |  Bookmark on Pinboard.in

Today Klacke announced Yaws 1.85, mainly a bugfix release. You can find the list of changes and fixes at that link, but one addition in this release I wanted to point out was our new streamcontent_from_pid feature, which allows your server application code to temporarily take over the client connection socket from Yaws, thus allowing you to feed data directly to the socket without first passing it back through Yaws. Could be just the ticket for long-polling (Comet) applications, for example.

Prior to this release, the closest feature Yaws provided for this sort of operation was the streamcontent capability, which is still very useful in that it allows you to have an Erlang process deliver data back into Yaws, which in turn sends it in HTTP chunked transfer mode back to the client. For large file resources like video files or install tarballs, or for data sources where content arrives at your server in batches from a separate back-end source, or generally for resources whose sizes are not known up front, streamcontent is perfect because it lets you transfer the data back into Yaws in chunks, at your leisure and without having to copy all the data at once. Still, though, the data has to be sent back through Yaws, which converts it to HTTP chunks and writes it to the socket, plus in this case your only choice is chunked transfer.

With streamcontent_from_pid, you reply to the out/1 upcall from Yaws with the HTTP reply headers and with the following special return tuple:

{streamcontent_from_pid, MimeType, StreamPid}

This tells Yaws that you wish to have process StreamPid take over the client socket in order to send data of media type MimeType directly back to the client. Yaws uses MimeType to set the HTTP Content-Type header, and then after sending that and all the other HTTP headers back to the client, it turns control of the client socket over to StreamPid. The code running within StreamPid must handle the following messages from Yaws:

  • {ok, YawsPid} tells StreamPid that it can proceed with using the socket. The socket is present in the original Arg variable passed to your out/1 function and can be retrieved via Arg#arg.clisock.
  • {discard, YawsPid} tells StreamPid that it shouldn’t send any data on the socket, for example because the client request was an HTTP HEAD request and so there is no response body.

To send data, your code can choose to send chunked data or non-chunked data. To send the latter, first make sure you set the HTTP Content-Length header in your initial reply to Yaws, and then once Yaws calls back to your StreamPid process, just call:

yaws_api:stream_process_deliver(Socket, IoList)

or for chunked data:

yaws_api:stream_process_deliver_chunk(Socket, IoList)

where for both cases Socket is the client socket from the Arg and IoList is an iolist containing the data to be sent. The first case just calls gen_tcp:send and is there primarily so we can maybe someday add SSL socket support for this feature. For the second case Yaws will format the data for you for chunked transfer. Unless a Content-Length header is set, Yaws will assume you want chunked transfer and will set the Transfer-Encoding header appropriately. If you’re sending chunked data, make sure you send your final chunk using the following function:

yaws_api:stream_process_deliver_final_chunk(Socket, IoList)

so that Yaws knows to send the termination chunk to inform the client of the end of the transfer.

You can continue to call these functions from your StreamPid as frequently as you need to in order to deliver your data to the client. Meanwhile, the Yaws process that handed you the socket will just sit back and wait for you (non-blocking, of course). When you’re completely finished sending, just call:

yaws_api:stream_process_end(Socket, YawsPid)

to end the transmission and give control of the socket back to Yaws. At that point, your StreamPid can exit if it wishes.

If you try out this feature, be sure to send feedback either to me or to the Yaws mailing list.

QCon London 2008 Presentation Video

April 9th, 2009  |  Published in conferences, distributed systems, HTTP, REST, reuse, web  |  Bookmark on Pinboard.in

You may have already seen this on InfoQ or on Stefan’s blog, but the video of my 2008 QCon London presentation “REST, Reuse, and Serendipity” is now available.

Here it is, just a little over a year after I gave that presentation, and REST continues to deliver extremely well for my work. For example, I just finished a meeting a couple hours ago where some client code needs to interact with a particular part of my system via HTTP but wants XML instead of the JSON currently provided. Simple — it’s just a different representation of the same resources, and of course it wasn’t hard to guess months ago that such a need would eventually come down the pike, so fitting it into the system will be trivial. Can you imagine the hoops one would have to jump through with typical RPC-oriented systems for this case, where the marshaling format is typically tied to the protocol and you can’t change either one? You’d have to write a new service interface with new verbs and new messages and get the client side to use it, or write client-side wrappers around whatever you already have and ask the client programmers to somehow incorporate those wrappers into their code. Either way, there’s simply no chance of reusing existing agreements; instead, both sides require non-trivial specialization.

One problem I noticed, though, was that the client developers asked for a “REST-like interface” and also for a document listing all resource URIs, and for each one, the HTTP verbs that apply to it, the representations available from it, and what status codes to expect from invoking operations on it. Those two requests are sort of mutually exclusive, depending on what “REST-like” means; for a proper RESTful system, you don’t need a document like that, at least not the type of document they’re asking for.

RESTful Web Services Development Checklist

November 1st, 2008  |  Published in column, coupling, design, distributed systems, HTTP, REST, services  |  Bookmark on Pinboard.in

My Nov./Dec. Internet Computing column is now available. It’s entitled RESTful Web Services Development Checklist and as its name implies, it covers some of the primary areas developers need to focus on to write good RESTful web services. These areas are:

  • Resources and their URIs
  • Applications and Hypermedia
  • Representations and Media Types
  • Methods
  • Conditional GET

Regarding the “Applications and Hypermedia” area, I feel Roy Fielding’s pain that many efforts labeled as being RESTful seem to completely ignore the hypermedia constraint. I believe many developers tend to miss this constraint because they’re so used to using libraries and frameworks that offer lots of entry points, and having knowledge of those entry points in the client normally isn’t that bad since the client and library/framework are tightly coupled into the same address space anyway. In a distributed system, though, this definitely does not hold true; when the client knows a bunch of entry points into the service, it ties the client to that service and inhibits their independent evolution.

Anyway, please read the column and let me know what you think, and thanks again to Stefan Tilkov for his helpful review of the draft.

Coincidentally I also feel Roy’s pain when it comes to writing about REST. He states:

I don’t try to tell them exactly what to do because, quite frankly, I don’t have anywhere near enough knowledge of their specific context to make such a decision.

So, when you find it hard to understand what I have written, please don’t think of it as talking above your head or just too philosophical to be worth your time. I am writing this way because I think the subject deserves a particular form of precision. Instead, take the time to look up the terms. Think of it as an opportunity to learn something new, not because I said so, but because it will do you some personal good to better understand the depth of our field.

Exactly.

Obviously, Roy is the ultimate REST authority, given that he defined it, so I’m not at all claiming to be anywhere near as authoritative about it as he is, yet I’ve also experienced what he says above. For example, consider this informal review of my columns I received a few months ago in a comment on someone else’s blog:

The articles of yours that I’ve read are…amorphous to me. They speak in generalities. I haven’t read an article where you sit down and write the same service using both REST and RPC and compare the two. When you speak in generalities, we can’t objectively evaluate any of the specific trade-offs between approaches… Arguments that happen at too abstract a level can’t go anywhere, because our positions aren’t specific enough for anyone to evaluate anybody else’s claims.

In other words, “since your columns don’t do my thinking and experimentation for me, they’re useless to me.” Hmm. Maybe I’m just old school, but I’d much rather understand mathematics than require someone to hold my hand while I blindly punch buttons on a calculator. In other words, as the old proverb goes, I’d much rather try to teach you to fish so you can feed yourself. As I state in this new column:

Whether developers of RESTful HTTP-based services write their code in IDEs or with simple text editors, and regardless of which programming languages they use, they must understand REST and HTTP fundamentals to succeed.

Mimeparse in Erlang

September 23rd, 2008  |  Published in code, erlang, HTTP, python, REST, Ruby, services  |  Bookmark on Pinboard.in

If you’re writing RESTful web services, you need to be able to parse HTTP Accept and Content-type headers. The Accept header tells you what media types the client can handle, and Content-type tells you what the client is sending when it invokes PUT or POST.

Several years ago Joe Gregorio wrote a really nice article explaining how to properly handle such parsing, and he implemented his ideas in the form of the mimeparse Python module. Later, others added Ruby and PHP versions.

In my Erlang web services work I’ve been doing just enough to parse my Accept headers, knowing full well my approach had holes in it and would eventually need to be done right. When I recently stumbled on Joe’s article and code, I decided to port it to Erlang and use it. Joe’s code is very clean and has accompanying unit tests, so the port was pretty easy. However, in the process of plugging it into my system I found a reasonably common case where the best_match function would return a wildcard match in preference to an exact match, so I added unit tests for that case and repaired the problem for all four languages.

I also found that Java’s URLConnection class, or maybe it’s HttpURLConnection, sends a plain “*” as a wildcard MIME type, which I don’t believe is legal (please correct me if I’m wrong), but since I figure there’s probably more than a few clients out there based on that code, I modified the four mimeparse implementations to handle that as well, treating it as “*/*”.

All four implementations are available from the mimeparse project home page.