
Hi all. I've been away for a few days due to loss of e-mail service when my dedicated server lost a hard drive. Unfortunately my ISP didn't support the OS version any more, so I had to rebuild everything for the new OS version. Anyway, on to the topic of my post. Should 'wsgi.input' become an iterator? Or should we develop a different API for asynchronous applications? On the positive side of the iterator approach, it could make it easier for asynchronous applications to pause waiting for input, and it could in principle support "chunked" transfer encoding of the input stream. However, since we last discussed this, I did some Googling on CGI and chunked encoding. By far and away, the most popular links regarding chunked encoding and CGI, are all about bugs in IIS and Apache leading to various vulnerabilities when chunked encoding is used. :( Once you get past those items (e.g. by adding "-IIS -vulnerability" to your search), you then find *our* discussion here on the Web-SIG! Finally, digging further, I found some 1998 discussion from the IPP (Internet Printing Protocol!) mailing list about what HTTP/1.1 servers support chunked encoding for CGI and which don't. Anyway, the long and short of it is that CGI and chunked encoding are quite simply incompatible, which means that relying on its availability would be nonportable in a WSGI application anyway. That leaves the asynchronous use case, but the benefit is rather strained at that point. Many frameworks reuse the 'cgi' module's 'FieldStorage' class in order to parse browser input, and the 'cgi' module's implementation requires an object with a 'readline()' method. That means that if we switch from an input stream to an iterator, a lot of people are going to be trying to make sensible wrappers to convert the iterator back to an input stream, and that's just getting ridiculous, especially since in many cases the server or gateway has a file-like object to start with. So, I'm thinking we should shift the burden to an async-specific API. But, in this case, "burden" means that we get to give asynchronous apps an API much more suited to their use cases. Suppose that we did something similar to 'wsgi.file_wrapper'? That is, suppose we had an optional extension that a server could provide, to wrap specialized application object(s) in a fashion that then provides backward compatibility to the spec? That is, suppose we had a 'wsgi.async_wrapper', used like this: if 'wsgi.async_wrapper' in environ: controller=environ['wsgi.async_wrapper'](environ) # do stuff with controller, like register its # methods as callbacks return controller The idea is that this would create an iterator that the server/gateway could recognize as "special", similar to the file-wrapper trick. But, the object returned would provide an extra API for use by the asynchronous application, maybe something like: put(data) -- queue data for retrieval when the controller is iterated over finish() -- mark the iterator finished, so it raises StopIteration on_get(length,callback) -- call 'callback(data)' when 'length' bytes are available on 'wsgi.input' (but return immediately from the 'on_get()' call) While this API is an optional extension, it seems it would be closer to what some async fans wanted, and less of a kludge. It won't do away with the possibility that middleware might block waiting for input, of course, but when no middleware is present or the middleware isn't transforming the input stream, it should work out quite well. In any case, the implementation of the methods and the iterator interface are pretty straightforward, either for synchronous or asynchronous servers. What do y'all think? I'd especially like feedback from Twisted folk, as to whether this looks anything like the right kind of API for async apps. (I expect it will need some tweaking and tuning.) But if this is the overall right approach, I'd like to drop the current proposals to make 'wsgi.input' an iterator and add optional 'pause'/'resume' APIs, since they were rather kludgy compared to giving async apps their own mini-API for nonblocking I/O. Comments? Questions?