[Web-SIG] Proposal: Avoiding Serialization When Stacking Middleware

Wed Mar 7 03:52:20 CET 2007

At 08:08 PM 3/6/2007 -0600, Ian Bicking wrote:
>Posted here: http://wsgi.org/wsgi/Specifications/avoiding_serialization
>
>Text copied below for discussion:
>
>
>:Title: Avoiding Serialization When Stacking Middleware
>:Author: Ian Bicking <ianb at colorstudy.com>
>:Discussions-To: Python Web-SIG <web-sig at python.org>
>:Status: Proposed
>:Created: 06-03-2007
>
>.. contents::
>
>Abstract
>--------
>
>This proposal gives a strategy for avoiding unnecessary serialization
>and deserialization of request and response bodies.  It does so by
>attaching attributes to ``wsgi.input`` and the ``app_iter``, as well as
>a new environment key ``x-wsgiorg.want_parsed_response``.
>
>Rationale
>---------
>
>Output-transforming middleware often has to parse the upstream content,
>transform it, then serialize it back to a string for output.  The
>original output may have already been in the parsed form that the
>middleware wanted.  Or there may be more middleware that does similar
>transformations on the same kind of objects.

HTTP already includes a mechanism for specifying what types are accepted by 
a content consumer: the "Accept" header.  You can always add other values 
to it to indicate the parsed values you can accept.

Of course, this doesn't really work well with WSGI - you want the result to 
actually *be* WSGI...  so you can use the WSGI way of doing this, which is 
to have a standard wrapper for the specific content type you want to use.

The wrapper (as with the wsgi "file wrapper") simply puts a WSGI face on a 
non-WSGI result body, converting it to an iterator of strings, and holding 
other attributes known to the middleware or other application object.

This could be implemented as an environ key containing a mapping from types 
to wrapper functions.  Middleware that wants a type just copies the mapping 
and overwrites any entries it cares about.  Applications that want to 
return a non-serialized result just look up the type (using __mro__ order) 
to find an applicable wrapper.

Notice that this approach doesn't require any special protocol for these 
wrappers -- just WSGI.  It's simpler to specify, and simpler to implement 
than what you propose, while addressing some of the open issues.

Yes, it does have some problems with interface vs. implementation.  ISTM 
that trying to solve that problem is effectively asking to revive or 
reinvent PEP 246, however.  But we could explicitly allow the use of type 
names instead of the actual types.

>The same things apply to the parsing of ``wsgi.input``, specifically
>parsing form data.  A similar strategy is presented to avoid
>unnecessarily reparsing that data.

I would rather offer an optional 'get_file_storage()' method or some such 
as a blessed WSGI extension, than have such an open-ended "get whatever you 
want from the input object" concept floating around.  A strategy which 
reinvents half of PEP 246 (the *old* PEP 246, before it became almost as 
complicated as WSGI) seems like overkill to me.

>Obviously the code is not simple, but this is the nature of WSGI
>output-transforming middleware.

Something I'd like to fix in WSGI 2.0, by getting rid of both 
"start_response" and "write", but that's a discussion for another time.

>Other Possibilities
>-------------------
>
>* You could simply parse everything ever time.
>* You could pass data through callbacks in the environment (but this can
>break non-aware middleware).
>* You can make custom methods and keys for each case.
>* You can use something other than WSGI.

And you can use the established WSGI method for adding semantics to a 
response, using a middleware-supplied wrapper.  I think this is actually 
the best alternative.

In truth, it could be as simple as using the class's fully-qualified name 
as an environ key (perhaps with a prefix or suffix), with the value being a 
wrapper for objects implementing that protocol.  No 
x-foobar-wsgiorg-whatchamacallit cruft needed.

And, it's lightweight enough of a concept to be expressed as a simple "best 
practice" design pattern.