[Web-SIG] Proposal: Avoiding Serialization When Stacking Middleware

Wed Mar 7 05:51:39 CET 2007

At 09:43 PM 3/6/2007 -0600, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>The wrapper (as with the wsgi "file wrapper") simply puts a WSGI face on 
>>a non-WSGI result body, converting it to an iterator of strings, and 
>>holding other attributes known to the middleware or other application object.
>
>That just calls for a series of ad hoc techniques,

As is appropriate for a "series of tubes".  :)

>  basically, where each object type results in a new key in the 
> environment and a new ad hoc specification to be made (e.g., 
> wsgi.file_wrapper takes a block size, which is specific only to that case).

Right.  I'm specifically saying that a collection of individual 
specifications is much *better* than a single overarching specification 
generalized from a single example.  Single use cases make bad general specs.

>OK, the dict would avoid multiple different kinds of keys, and presumably 
>they'd all have the same signature.  Block size doesn't really make any 
>sense to me as a common parameter.  Content type should be a common 
>parameter, as something like an lxml object can be serialized as either 
>XML or HTML.  I don't think any response headers are likely to effect the 
>serialization... though with my specification that remains an application 
>concern, so it doesn't have to be resolved in the specification.

Please don't keep trying to generalize this.  They're called 
"specific-ations", not "general-izations".  :)

>>Notice that this approach doesn't require any special protocol for these 
>>wrappers -- just WSGI.  It's simpler to specify, and simpler to implement 
>>than what you propose, while addressing some of the open issues.
>
>The specification isn't particularly long or complicated, IMHO.

That's because it doesn't address any of the real issues -- they're all 
deferred to your "open issues" section.  That's why I don't think having 
the specification adds any value over highlighting the existing WSGI 
pattern for extending the response (i.e. server-supplied iterator-wrappers).

>When playing with implementation I used type names, and actually I rather 
>prefer them, but it's not always clear what name to use.  For instance, 
>"lxml", "lxml.etree", "lxml.etree.Element", and "lxml.etree._Element" all 
>are reasonable names.  Or "ElementTree", "ElementTree.Element", 
>"ElementTree._Element", "xml.etree", "xml.etree.Element", and 
>"xml.etree._Element".  Or even something like "IElement" could make sense 
>in some context (e.g., what if you can accept the overlapping interfaces 
>of both lxml and ElementTree?)
>
>At least the actual type object seems easy enough.  OTOH, there are 
>actually cases when I'd like to say that I could accept a certain type 
>without having to import the type.  E.g., if I wanted to do an XSLT 
>transformation, I *could* support several kinds of objects without 
>requiring any of them (e.g., lxml, 4DOM, and Genshi Markup).

These problems all stem from premature generalization.  It's a trivial 
problem to fix, however, if you are trying to share one particular content 
type: just pick a key and use it!

Libraries such as wsgiref can support this pattern by providing a utility 
like "wrap_content(environ, content, default_wrapper, *keys)" function that 
looks up "keys" to find a wrapper to use in place of the default_wrapper.

>>>The same things apply to the parsing of ``wsgi.input``, specifically
>>>parsing form data.  A similar strategy is presented to avoid
>>>unnecessarily reparsing that data.
>>I would rather offer an optional 'get_file_storage()' method or some such 
>>as a blessed WSGI extension, than have such an open-ended "get whatever 
>>you want from the input object" concept floating around.  A strategy 
>>which reinvents half of PEP 246 (the *old* PEP 246, before it became 
>>almost as complicated as WSGI) seems like overkill to me.
>
>I don't really understand what you are proposing.

That wsgi.input be allowed to have a 'get_file_storage()' method that can 
be called by applications, and that calling it means the input stream must 
not have been read and will no longer be readable.

>This part addresses the same issues as presented in 
>http://wsgi.org/wsgi/Specifications/handling_post_forms
>
>I really don't *want* to write every wsgi.input to a temporary file just 
>because someone else *might* want to reparse the input.  I'd much rather 
>do it lazily, as 99% of the time reparsing won't happen.

I don't understand your complaint, as it seems unrelated to what I propose.

>>>Other Possibilities
>>>-------------------
>>>
>>>* You could simply parse everything ever time.
>>>* You could pass data through callbacks in the environment (but this can
>>>break non-aware middleware).
>>>* You can make custom methods and keys for each case.
>>>* You can use something other than WSGI.
>>And you can use the established WSGI method for adding semantics to a 
>>response, using a middleware-supplied wrapper.  I think this is actually 
>>the best alternative.
>
>I really don't understand the advantage.

It's simple: *specifications are a liability in the general case*.  They 
are supposed to be the record of negotiations between people who need to 
co-operate, not an attempt to solve all possible problems.

So, if your spec is only about how relatively tight-coupled WFC's (WSGI 
framework components) talk to each other, it seems more properly the 
business of a web framework, not WSGI.

However, it *is* WSGI (wsgi-onic?) for the authors of certain components to 
get together and say, "hey let's agree on this wrapper protocol"...  or 
better yet, a wrapper *implementation*.

This is way way better than having another spec.  Every godforsaken new 
spec attached to WSGI just makes the whole thing seem way too 
complicated.  In retrospect, I wish I hadn't supported some of the options 
and doodads and whatnots that are in WSGI today.  If I had it to do over, 
WSGI would be a lot simpler.

However, it's not too late to stop adding new cruft -- and I consider the 
idea of reinventing PEP 246 inside of WSGI to be cruft of a most horrible kind.

>Best practice is fine, though of course still needs to be documented, as 
>this is hardly a practice that people would naturally think about or implement.

Well, it's in PEP 333.

>   But I don't really think that practice would be any simpler or easier 
> to describe if done completely.  In fact, I think it would take exactly 
> the same amount of space to describe.

Even if it *did*, it'd still be better.  However, since it's not a spec, it 
can be presented informally.  Here's an example:

"If you want to give applications underneath your middleware a chance to 
return rich responses (i.e., objects instead of strings), follow the 
pattern used for the WSGI 'file wrapper' object.  That is, have your server 
or middleware add an environ key with a wrapper API that can convert the 
richer objects you're expecting into a standard WSGI iterator.  Then, your 
server can simply inspect the iterators it receives to see if they are 
instances of your wrapper type, and pull out the objects you want.  In this 
way, if there is middleware between you and the application returning the 
rich response that modifies the response body, you will receive an iterator 
of a different type, which you can process in the usual way.  However, if 
you receive an instance of your wrapper type, you will know that you can 
access the rich data directly."

Now, can you expand this into more of a tutorial, give more hints and so 
on?  Absolutely.  It'd be a great idea to.  But the basic idea is simple 
and doesn't require rigorous definitions -- it just needs people to publish 
what keys they're using and the *specifications thereof*.

What you're trying to specify is effectively a *meta*-specification: much 
more difficult to do well, and not nearly as useful to have in this case.