[Web-SIG] Python 3.0 and WSGI 1.0.

Graham Dumpleton graham.dumpleton at gmail.com
Wed Apr 1 22:51:35 CEST 2009


2009/4/2 Guido van Rossum <guido at python.org>:
> On Wed, Apr 1, 2009 at 12:15 PM, Ian Bicking <ianb at colorstudy.com> wrote:
>> On Wed, Apr 1, 2009 at 11:34 AM, Guido van Rossum <guido at python.org> wrote:
>>> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer <fumanchu at aminus.org> wrote:
>>>> Good timing. We had been thinking to make everything strings except for
>>>> SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are pulled
>>>> from the Request-URI, which may be in any encoding. It was thought that
>>>> the app would be best-qualified to decode those three.
>>>
>>> Argh. The *meaning* of these fields is clearly text. It would be most
>>> unfortunately if all apps were required to deal with decoding bytes
>>> for these (there is no choice any more, unlike in 2.x). I appreciate
>>> the sentiment that the encoding is unknown, but I would much prefer it
>>> if there was a default encoding that the app could override, or if
>>> there was some other mechanism whereby the app would not have to be
>>> bothered with decoding bytes unless it cared.
>>
>> This might be fine, except it is hard.  You can't just take arbitrary
>> bytes and do script_name.decode('utf8'), and then when you realize you
>> had it wrong do script_name.encode('utf8').decode('latin1').
>
> Well you could make the bytes versions available under different keys.
> I think you do something a bit similar this in webob, e.g. req.params
> vs. req.str_params. (Perhaps you could have QUERY_STRING and
> QUERY_BYTES.) The decode() call used to create the text strings could
> use 'replace' as the error handler and the app could check for the
> presence of the replacement character ('\ufffd') in the string to see
> if there was a problem; or it could just work with the string
> containing that character and report the user some kind of 40x or 50x
> error. Frameworks (like webob) would of course do the right thing and
> look for QUERY_BYTES before QUERY_STRING. QUERY_BYTES should probably
> be optional.

Can we please not invent new names at global context in WSGI
environment dictionary, especially ones that mutate existing names
rather than using a prefix or suffix.

If we are going to carry values in two different formats, then use the
'wsgi' name space. Thus, for byte versions of values perhaps use:

  wsgi.request_uri
  wsgi.script_name
  wsgi.path_info
  wsgi.query_string
  etc

In other words, leave all the existing CGI variables to come through
as latin-1 decode and do anything new in 'wsgi' variable namespace,
identifying only the minimal set which needs to be made available as
bytes.

Graham


More information about the Web-SIG mailing list