[Web-SIG] PEP 444
alice at gothcandy.com
Sun Nov 21 12:12:34 CET 2010
(A version of this is is available at http://web-core.org/2.0/pep-0444/ — links are links, code may be easier to read.)
PEP 444 is quite exciting to me. So much so that I’ve been spending a few days writing a high-performance (C10K, 10Krsec) Py2.6+/3.1+ HTTP/1.1 server which implements much of the proposed standard. The server is functional (less web3.input at the time of this writing), but differs from PEP 444 in several ways. It also adds several features I feel should be part of the spec.
Source for the server is available on GitHub:
I have made several notes about the PEP 444 specification during implementation of the above, and concern over some implementation details:
First, async is poorly defined:
> If the origin server advertises that it has the web3.async capability, a Web3 application callable used by the server is permitted to return a callable that accepts no arguments. When it does so, this callable is to be called periodically by the origin server until it returns a non-None response, which must be a normal Web3 response tuple.
Polling is not true async. I believe that it should be up to the server to define how async is utilized, and that the specification should be clarified on this point. (“Called periodically” is too vague.) “Callable” should likely be redefined as “generator” (a callable that yields) as most applications require holding on to state and wrapping everything in functools.partial() is somewhat ugly. Utilizing generators would improve support for existing Python async frameworks, and allow four modes of operation: yield None (no response, keep waiting), yield response_tuple (standard response), return / raise StopIteration (close the async connection) and allow for data to be passed back to the async callable by the higher-level async framework.
Second, WSGI middleware, while impressive in capability, are somewhat… heavy-weight. Heavily nesting function calls is wasteful of CPU and RAM, especially if the middleware decides it can’t operate, for example, GZip compression disabling itself for non-text/ mimetypes. The majority of WSGI middleware can, and probably should be, implemented as linear ingress or egress filters. For example, on-disk static file serving could be an ingress filter, and GZip compression an egress filter. m.s.http supports this filtering and demonstrates one API for such. Also, I am in the process of writing an example egress CompressionFilter.
An example API and filter use implementation: (paraphrased from marrow.server.http)
> # No filters, near 0 overhead.
> for filter_ in ingress_filters:
> # Can mutate the environment.
> result = filter_(env)
> # Allow the filter to return a response rather than continuing.
> if result:
> # result is a status, headers, body_iter tuple
> return result, result, result
> status, headers, body = application(env)
> for filter_ in egress_filters:
> # Can mutate the environment, status, headers, body, or
> # return completely new status, headers, and body.
> status, headers, body = filter_(env, status, headers, body)
> return status, headers, body
The environment has some minor issues. I’ll write up my changes in RFC-style:
SERVER_NAME is REQUIRED and MUST contain the DNS name of the server OR virtual server name for the web server if available OR an empty bytestring if DNS resolution is unavailable. SERVER_ADDR is REQUIRED and MUST contain the web server’s bound IP address. URL reconstruction SHOULD use HTTP_HOST if available, SERVER_NAME if there is no HTTP_HOST, and fall back on SERVER_ADDR if SERVER_NAME is an empty bytestring.
CONTENTL_LENGTH is REQUIRED and MUST be None if not defined by the client. Testing explicitly for None is more efficient than armoring against missing values; also, explicit is better than implicit. (Paste’s WSGI1 server defines CONTENT_LENGTH as 0, but this implies the client explicitly declared it as zero, which is not the case.)
FRAGMENT and PARAMETERS are REQUIRED and are parsed out of the URL in the same way as the QUERY_STRING. FRAGMENT is the text after a hash mark (a.k.a. “anchor” to browsers, e.g. /foo#bar). PARAMETERS come before QUERY_STRING, and after PATH_INFO separated by a semicolon, e.g. /foo;bar?baz. Both values MUST be empty bytestrings if not present in the URL. (Rarely used — I’ve only seen it in Java and ColdFusion applications — but still useful.)
Points of contention:
Changing the namespace seems needless. Using the wsgi.* namespace with a wsgi.version of (2, 0) will allow applications to easily armor themselves against incompatible use. That’s what wsgi.version is for! I’d add this as a strong “point of contention”. m.s.http keeps the wsgi namespace and uses a version of (2, 0).
That’s it so far. I may occasionally write in with additional ideas as I continue with my HTTP server implementation.
More information about the Web-SIG