[Web-SIG] Proposal: Avoiding Serialization When Stacking Middleware
Phillip J. Eby
pje at telecommunity.com
Wed Mar 7 03:52:20 CET 2007
At 08:08 PM 3/6/2007 -0600, Ian Bicking wrote:
>Posted here: http://wsgi.org/wsgi/Specifications/avoiding_serialization
>Text copied below for discussion:
>:Title: Avoiding Serialization When Stacking Middleware
>:Author: Ian Bicking <ianb at colorstudy.com>
>:Discussions-To: Python Web-SIG <web-sig at python.org>
>This proposal gives a strategy for avoiding unnecessary serialization
>and deserialization of request and response bodies. It does so by
>attaching attributes to ``wsgi.input`` and the ``app_iter``, as well as
>a new environment key ``x-wsgiorg.want_parsed_response``.
>Output-transforming middleware often has to parse the upstream content,
>transform it, then serialize it back to a string for output. The
>original output may have already been in the parsed form that the
>middleware wanted. Or there may be more middleware that does similar
>transformations on the same kind of objects.
HTTP already includes a mechanism for specifying what types are accepted by
a content consumer: the "Accept" header. You can always add other values
to it to indicate the parsed values you can accept.
Of course, this doesn't really work well with WSGI - you want the result to
actually *be* WSGI... so you can use the WSGI way of doing this, which is
to have a standard wrapper for the specific content type you want to use.
The wrapper (as with the wsgi "file wrapper") simply puts a WSGI face on a
non-WSGI result body, converting it to an iterator of strings, and holding
other attributes known to the middleware or other application object.
This could be implemented as an environ key containing a mapping from types
to wrapper functions. Middleware that wants a type just copies the mapping
and overwrites any entries it cares about. Applications that want to
return a non-serialized result just look up the type (using __mro__ order)
to find an applicable wrapper.
Notice that this approach doesn't require any special protocol for these
wrappers -- just WSGI. It's simpler to specify, and simpler to implement
than what you propose, while addressing some of the open issues.
Yes, it does have some problems with interface vs. implementation. ISTM
that trying to solve that problem is effectively asking to revive or
reinvent PEP 246, however. But we could explicitly allow the use of type
names instead of the actual types.
>The same things apply to the parsing of ``wsgi.input``, specifically
>parsing form data. A similar strategy is presented to avoid
>unnecessarily reparsing that data.
I would rather offer an optional 'get_file_storage()' method or some such
as a blessed WSGI extension, than have such an open-ended "get whatever you
want from the input object" concept floating around. A strategy which
reinvents half of PEP 246 (the *old* PEP 246, before it became almost as
complicated as WSGI) seems like overkill to me.
>Obviously the code is not simple, but this is the nature of WSGI
Something I'd like to fix in WSGI 2.0, by getting rid of both
"start_response" and "write", but that's a discussion for another time.
>* You could simply parse everything ever time.
>* You could pass data through callbacks in the environment (but this can
>break non-aware middleware).
>* You can make custom methods and keys for each case.
>* You can use something other than WSGI.
And you can use the established WSGI method for adding semantics to a
response, using a middleware-supplied wrapper. I think this is actually
the best alternative.
In truth, it could be as simple as using the class's fully-qualified name
as an environ key (perhaps with a prefix or suffix), with the value being a
wrapper for objects implementing that protocol. No
x-foobar-wsgiorg-whatchamacallit cruft needed.
And, it's lightweight enough of a concept to be expressed as a simple "best
practice" design pattern.
More information about the Web-SIG