[Web-SIG] WSGI Open Space @ PyCon.

Graham Dumpleton graham.dumpleton at gmail.com
Mon Mar 30 02:13:45 CEST 2009


2009/3/30 Robert Brewer <fumanchu at aminus.org>:
> We had a smaller third meeting and answered more issues.
>
> Those present at the third meeting:
>
>  * Mark Ramm (TG)
>  * Mike Orr (Pylons)
>  * Bob Brewer (CherryPy)
>  * Glyph Lefkowitz (Twisted)
>  * David Reid (Twisted)
>  * Jean-Paul Calderone (Twisted)
>
> Continuing Topic: string type for PATH_INFO and SCRIPT_NAME
> -----------------------------------------------------------
>
> Much discussion on how to safely decode the Request-URI. Several options
> were put forth, including schemes where both unicode and bytes are stuck
> in the environ. Final rough consensus was that, even though request
> headers MUST be unicode in the environ, SCRIPT_NAME and PATH_INFO
> probably MUST be byte strings in order to not "guess wrong" about their
> encoding.

Which may be a problem if want to support CGI/WSGI bridges as in
Python 3.0 os.environ is unicode and so conversion already done. The
issue is whether one can convert it back to bytes safely. Presumably
the encoding applied is known from somewhere.

> In addition, a new environ key which indicates whether
> %2F-slashes were decoded improperly or not would be beneficial.

Someone want to define 'decoded improperly'? :-)

But then, see that subsequent followup has in part answer that already.

> Continuing Topic: wsgi.input
> ----------------------------
>
> Glyph: iterable is good; file-like is also OK.

Any mention of chunked request input and how to handle it if file like
object still used?

Still believe we must have requirement for empty string as EOS
sentinel and require changes to how wsgi.input is consumed to read
until empty string, rather than only reading what content length says.

> Big issue: need a way for the app to tell the server that it is waiting
> on output from some other source, possibly running in the same event
> loop.
>
>      _______ Reactor ______
>     /                      \
> +--------+  +--------+  +----------+
> |  IMAP  |==|  App   |==|  Server  |
> +--------+  +--------+  +----------+
>
> Yielding an empty string (as WSGI 1.0 does) does not provide enough
> information; the app needs a way to yield a token which tells the server
> "don't call my next() method again until my other source has given me
> more input on which to operate."

Which almost sounds like you want to allow a specific implementation
to supply a special class instance via WSGI environment, an instance
of which could be passed back instead of a string from the iterable.
So, like wsgi.file_wrapper(), but instead of replacing the whole
iterable, it becomes one element of it.

This could actually be generalised rather than being specific to this
specific use case scenario. For example, the WSGI environment key may
be 'wsgi.script_control', sort of akin to Script-Control response
header in CGI 1.2. The arguments to this when called could be a string
giving what is being controlled and a tuple or dictionary the meaning
of which is specific to what is being controlled.

The problem with this is what happens if a WSGI middleware tries to do
something with it. If the separate change is made to allow string like
objects to be returned instead of only string objects, then its string
like behaviour could be to appear like an empty string. Thus a
middleware would see it as an empty string. Of course, the WSGI
middleware then may suppress it and not pass it back down the line.

As with wsgi.file_wrapper, the problem is that only the WSGI adapter
really knows what the type of the iterable instance being returned is,
a WSGI middleware can't work it out, except maybe by creating a dummy
instance of one and comparing the types. Even then, that may not be
guaranteed.

This therefore makes it hard for a WSGI middleware to even detect such
special control/meta elements and simply pass them through. That also
wouldn't work anyway, as a WSGI middleware could be trying to combine
together all the strings into one big string to stick a content length
on it. In that case it isn't going to even know that a special control
element is saying that it should stop trying to do that and instead
flush out what it has already accumulated so that underlying server
can do other stuff while waiting for file descriptor to be ready.

A WSGI middleware doing this sort of thing is going to screw you up
even if empty string as might be getting used for this purpose now. In
short, I can't see how you can do it this way, as you have no
guarantee that a WSGI middleware will not hold on to it. Thus never
gets back to underlying WSGI adapter.

> Asynchronous WSGI support
> -------------------------
>
> Mostly non-existent. Fix it? Fork it? Drop it? Glyph seemed to think
> we're really close if we fix wsgi.input.

Which fix/change to wsgi.input are we talking about here?

> Response value type
> -------------------
>
> Glyph suggested as the response tuple grows (e.g. by adding a "close"
> method), we should more consider returning an object with .status,
> .headers, .body, and .close attributes. Packing/unpacking tuples becomes
> tedious. Everyone agreed. If changing to an object is not possible, then
> a tuple should not have a variable length; that is, no members would be
> optional. Returning a dict would be another option (which would allow
> optional keys).

I'll have something more to say about this another time. :-)

> Continuing deferred issues
> --------------------------
>
>  * Lots of little changes: the server's supported HTTP version,
>   file_wrapper edge cases, etc.
>  * Python 3, and the scheduling of WSGI improvements (version roadmap)
>  * Lifecycle methods (start/stop/etc event API driven by the container)

Graham


More information about the Web-SIG mailing list