Firstly: HTTP 1.1 compliance is not easy. It's not *too* bad for an origin server, but if Twisted wants a compliant HTTP proxy module (even non-caching), there's a lot of requirements. Squid made a nice table of the 473 MUST/MAY/SHOULD [NOT]s which is helpful.
Anyhow as it's nearing workability, I thought I'd just write a bit about how I it's structured. This is kinda rambling but oh well. ;)
There's 4 main classes on the path for handling a HTTP connection: HTTPFactory: a ServerFactory that creates HTTPChannel objects for each incoming TCP connection. HTTPChannel: keeps track of queued 'ChannelRequest' objects, and some of the splitting up of the incoming data into distinct requests work. ChannelRequest: handles all the low-level hop-by-hop behavior. Request: high level request/response behavior.
The split off of ChannelRequest from bits of HTTPChannel and bits of Request is geared towards allowing a different request transport than normal HTTP. PB is one possibility. Also, it simplifies the Request object and gives it a cleaner API that only has to deal with the actual request, not the details of transfer encodings, pipelined connections, etc.
ChannelRequest provides the following methods for Requests to call: def writeIntermediateResponse(self, code, headers=None, code_message=None): def writeHeaders(self, code, headers, code_message=None): def writeData(self, data): def finish(self): def abortConnection(self): Also the producer methods: def registerProducer(self, producer, streaming): def unregisterProducer(self):
Request provides the following callbacks that are called by ChannelRequest: def __init__(self, chanRequest, command, path, version, in_headers): def handleContentChunk(self, data): def handleContentComplete(self): def connectionLost(self, reason):
The core of the public interface to this whole thing are the fields/methods on Request: method: HTTP method used uri: URI passed in the request. clientproto: Tuple like (1,1) out_headers: a Headers object containing the headers to output. in_headers: a Headers object containing the incoming headers.
acceptData(self): Call to notify the sender that you intend to accept the request. checkPreconditions(self); check if the preconditions are satisfied, and thus whether the action should take place/the output data should be written. write(self, data): Call to write some data. If headers haven't been written yet, write them. writeFile(...): Call to write a file in an optimized way like sendfile(). TBD what actually goes here. finish(self): Call when you've finished writing data.
Callbacks to override: process(self): called from __init__. Incoming headers have been received, but no data yet. Should do resource lookup. handleContentChunk(self, data): A chunk of data was received. handleContentComplete(self): The incoming data is done. connectionLost(self): the underlying connection was lost.
in_headers/out_headers are objects of type http_headers.Headers which provides for a standardized way of translating between raw string headers and structured data headers. Some of the header parsers are not written yet.
Unlike the old Request, this one is going to do nothing with incoming data. No form processing, no buffering, no nothing. No args processing of the uri, either. The "full featured" subclass of Request (e.g. server.Request) can do that stuff. It is expected to do all URI frobbing and then a locateChild() lookup at process() time (before the data has arrived). Then, figure out what the located resource wants to do with the incoming data (ignore it, buffer it all up into one string, or pass it along as it comes in). Note that this means locateChild can't use POST arguments. Then, in the usual case, render() would be called after all the data has arrived and form processing has been done on it. But for some resources, e.g. a proxying resource, it would just send all the data straight through, without doing form processing.