[Web-SIG] Proposal: Avoiding Serialization When Stacking Middleware

Ian Bicking ianb at colorstudy.com
Tue Mar 13 20:47:54 CET 2007

Phillip J. Eby wrote:
>>  basically, where each object type results in a new key in the 
>> environment and a new ad hoc specification to be made (e.g., 
>> wsgi.file_wrapper takes a block size, which is specific only to that 
>> case).
> Right.  I'm specifically saying that a collection of individual 
> specifications is much *better* than a single overarching specification 
> generalized from a single example.  Single use cases make bad general 
> specs.
>> OK, the dict would avoid multiple different kinds of keys, and 
>> presumably they'd all have the same signature.  Block size doesn't 
>> really make any sense to me as a common parameter.  Content type 
>> should be a common parameter, as something like an lxml object can be 
>> serialized as either XML or HTML.  I don't think any response headers 
>> are likely to effect the serialization... though with my specification 
>> that remains an application concern, so it doesn't have to be resolved 
>> in the specification.
> Please don't keep trying to generalize this.  They're called 
> "specific-ations", not "general-izations".  :)
>>> Notice that this approach doesn't require any special protocol for 
>>> these wrappers -- just WSGI.  It's simpler to specify, and simpler to 
>>> implement than what you propose, while addressing some of the open 
>>> issues.
>> The specification isn't particularly long or complicated, IMHO.
> That's because it doesn't address any of the real issues -- they're all 
> deferred to your "open issues" section.  That's why I don't think having 
> the specification adds any value over highlighting the existing WSGI 
> pattern for extending the response (i.e. server-supplied 
> iterator-wrappers).

The open issues section has three issue.  One is a matter of defining 
some naming convention, and as long as people *try* to match up their 
conventions it will work.  The second has a proposed solution.  The last 
is merely aesthetic.

These are the "real issues" you are referring to?

>> When playing with implementation I used type names, and actually I 
>> rather prefer them, but it's not always clear what name to use.  For 
>> instance, "lxml", "lxml.etree", "lxml.etree.Element", and 
>> "lxml.etree._Element" all are reasonable names.  Or "ElementTree", 
>> "ElementTree.Element", "ElementTree._Element", "xml.etree", 
>> "xml.etree.Element", and "xml.etree._Element".  Or even something like 
>> "IElement" could make sense in some context (e.g., what if you can 
>> accept the overlapping interfaces of both lxml and ElementTree?)
>> At least the actual type object seems easy enough.  OTOH, there are 
>> actually cases when I'd like to say that I could accept a certain type 
>> without having to import the type.  E.g., if I wanted to do an XSLT 
>> transformation, I *could* support several kinds of objects without 
>> requiring any of them (e.g., lxml, 4DOM, and Genshi Markup).
> These problems all stem from premature generalization.  It's a trivial 
> problem to fix, however, if you are trying to share one particular 
> content type: just pick a key and use it!

That's not much easier, really.  It would still be documented, still 
needs to be implemented and defined properly.  The biggest difference is 
that it needs to be done again for each type of object.

> Libraries such as wsgiref can support this pattern by providing a 
> utility like "wrap_content(environ, content, default_wrapper, *keys)" 
> function that looks up "keys" to find a wrapper to use in place of the 
> default_wrapper.
>>>> The same things apply to the parsing of ``wsgi.input``, specifically
>>>> parsing form data.  A similar strategy is presented to avoid
>>>> unnecessarily reparsing that data.
>>> I would rather offer an optional 'get_file_storage()' method or some 
>>> such as a blessed WSGI extension, than have such an open-ended "get 
>>> whatever you want from the input object" concept floating around.  A 
>>> strategy which reinvents half of PEP 246 (the *old* PEP 246, before 
>>> it became almost as complicated as WSGI) seems like overkill to me.
>> I don't really understand what you are proposing.
> That wsgi.input be allowed to have a 'get_file_storage()' method that 
> can be called by applications, and that calling it means the input 
> stream must not have been read and will no longer be readable.
>> This part addresses the same issues as presented in 
>> http://wsgi.org/wsgi/Specifications/handling_post_forms
>> I really don't *want* to write every wsgi.input to a temporary file 
>> just because someone else *might* want to reparse the input.  I'd much 
>> rather do it lazily, as 99% of the time reparsing won't happen.
> I don't understand your complaint, as it seems unrelated to what I propose.

I didn't understand what you were proposing, I think.  I still don't 
really know what get_file_storage means.

>>>> Other Possibilities
>>>> -------------------
>>>> * You could simply parse everything ever time.
>>>> * You could pass data through callbacks in the environment (but this 
>>>> can
>>>> break non-aware middleware).
>>>> * You can make custom methods and keys for each case.
>>>> * You can use something other than WSGI.
>>> And you can use the established WSGI method for adding semantics to a 
>>> response, using a middleware-supplied wrapper.  I think this is 
>>> actually the best alternative.
>> I really don't understand the advantage.
> It's simple: *specifications are a liability in the general case*.  They 
> are supposed to be the record of negotiations between people who need to 
> co-operate, not an attempt to solve all possible problems.

This certainly doesn't solve all possible problems, it only addresses 
one particular issue.

> So, if your spec is only about how relatively tight-coupled WFC's (WSGI 
> framework components) talk to each other, it seems more properly the 
> business of a web framework, not WSGI.

Most of the places I want to use this are *not* at the framework level. 
  A simple example is just parsing form data without having to own the 
data, which is an outstanding issue with WSGI stacks, and can be done 
outside of a framework.  Another is how to communicate non-string data 
while having graceful fallback for string data.  This is of particular 
interest to me, as I turn WSGI into HTTP quite often, and there's 
definitely nothing but strings at that point.

> However, it *is* WSGI (wsgi-onic?) for the authors of certain components 
> to get together and say, "hey let's agree on this wrapper protocol"...  
> or better yet, a wrapper *implementation*.
> This is way way better than having another spec.  Every godforsaken new 
> spec attached to WSGI just makes the whole thing seem way too 
> complicated.  In retrospect, I wish I hadn't supported some of the 
> options and doodads and whatnots that are in WSGI today.  If I had it to 
> do over, WSGI would be a lot simpler.

This is a wsgiorg. specification, not a wsgi., and it's not meant to 
solve all issues.  It is meant to be implementation neutral.

> However, it's not too late to stop adding new cruft -- and I consider 
> the idea of reinventing PEP 246 inside of WSGI to be cruft of a most 
> horrible kind.
>> Best practice is fine, though of course still needs to be documented, 
>> as this is hardly a practice that people would naturally think about 
>> or implement.
> Well, it's in PEP 333.

It's a nice idea, but as far as I know no one has actually used 
wsgi.file_wrapper.  Though so far no one has paid very close attention 
to these kinds of performance issues either.  I think using it in a 
useful way requires platform-specific twiddling that no one cares to do.

>>   But I don't really think that practice would be any simpler or 
>> easier to describe if done completely.  In fact, I think it would take 
>> exactly the same amount of space to describe.
> Even if it *did*, it'd still be better.  However, since it's not a spec, 
> it can be presented informally.  Here's an example:
> "If you want to give applications underneath your middleware a chance to 
> return rich responses (i.e., objects instead of strings), follow the 
> pattern used for the WSGI 'file wrapper' object.  That is, have your 
> server or middleware add an environ key with a wrapper API that can 
> convert the richer objects you're expecting into a standard WSGI 
> iterator.  Then, your server can simply inspect the iterators it 
> receives to see if they are instances of your wrapper type, and pull out 
> the objects you want.  In this way, if there is middleware between you 
> and the application returning the rich response that modifies the 
> response body, you will receive an iterator of a different type, which 
> you can process in the usual way.  However, if you receive an instance 
> of your wrapper type, you will know that you can access the rich data 
> directly."
> Now, can you expand this into more of a tutorial, give more hints and so 
> on?  Absolutely.  It'd be a great idea to.  But the basic idea is simple 
> and doesn't require rigorous definitions -- it just needs people to 
> publish what keys they're using and the *specifications thereof*.
> What you're trying to specify is effectively a *meta*-specification: 
> much more difficult to do well, and not nearly as useful to have in this 
> case.

Except insofar as "type" is variable in my specification, I don't think 
it is substantially different.

If no one cares about this, then I guess I can just put it under the 
httpencode namespace where it was before, but I don't see any reason to 
make it less general.

Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

More information about the Web-SIG mailing list