[Web-SIG] Python 3.0 and WSGI 1.0.

Thu Apr 2 00:30:02 CEST 2009

Graham Dumpleton wrote:
> 2009/4/2 Guido van Rossum <guido at python.org>:
> > On Wed, Apr 1, 2009 at 12:15 PM, Ian Bicking <ianb at colorstudy.com>
> wrote:
> >> On Wed, Apr 1, 2009 at 11:34 AM, Guido van Rossum <guido at python.org>
> wrote:
> >>> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer <fumanchu at aminus.org>
> wrote:
> >>>> Good timing. We had been thinking to make everything strings
> except for
> >>>> SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are
> pulled
> >>>> from the Request-URI, which may be in any encoding. It was thought
> that
> >>>> the app would be best-qualified to decode those three.
> >>>
> >>> Argh. The *meaning* of these fields is clearly text. It would be
> most
> >>> unfortunately if all apps were required to deal with decoding bytes
> >>> for these (there is no choice any more, unlike in 2.x). I
> appreciate
> >>> the sentiment that the encoding is unknown, but I would much prefer
> it
> >>> if there was a default encoding that the app could override, or if
> >>> there was some other mechanism whereby the app would not have to be
> >>> bothered with decoding bytes unless it cared.
> >>
> >> This might be fine, except it is hard.  You can't just take
> arbitrary
> >> bytes and do script_name.decode('utf8'), and then when you realize
> you
> >> had it wrong do script_name.encode('utf8').decode('latin1').
> >
> > Well you could make the bytes versions available under different
> keys.
> > I think you do something a bit similar this in webob, e.g. req.params
> > vs. req.str_params. (Perhaps you could have QUERY_STRING and
> > QUERY_BYTES.) The decode() call used to create the text strings could
> > use 'replace' as the error handler and the app could check for the
> > presence of the replacement character ('\ufffd') in the string to see
> > if there was a problem; or it could just work with the string
> > containing that character and report the user some kind of 40x or 50x
> > error. Frameworks (like webob) would of course do the right thing and
> > look for QUERY_BYTES before QUERY_STRING. QUERY_BYTES should probably
> > be optional.
> 
> Can we please not invent new names at global context in WSGI
> environment dictionary, especially ones that mutate existing names
> rather than using a prefix or suffix.
> 
> If we are going to carry values in two different formats, then use the
> 'wsgi' name space. Thus, for byte versions of values perhaps use:
> 
>   wsgi.request_uri
>   wsgi.script_name
>   wsgi.path_info
>   wsgi.query_string
>   etc
> 
> In other words, leave all the existing CGI variables to come through
> as latin-1 decode and do anything new in 'wsgi' variable namespace,
> identifying only the minimal set which needs to be made available as
> bytes.

Some thoughts:

 1. If we always decode as Latin-1 it should be lossless, and consumers could retrieve the original bytes with val.decode('Latin-1'), thus removing the need for separate entries.

 2. CGI says, "REMOTE_USER = *OCTET" :(

 3. Bikeshed: "wsgi.xyz" is too close to "XYZ" in my opinion.

Robert Brewer
fumanchu at aminus.org