[docs] [issue16679] Add advice about non-ASCII wsgiref PATH_INFO

Andrew Clover report at bugs.python.org
Thu Apr 21 13:55:05 EDT 2016


Andrew Clover added the comment:

> Why only PATH_INFO is encoded in such a manner, but QUERY_STRING is passed without any changes and does not requires any latin-1 to utf-8 recodings?

Laziness: QUERY_STRING should be pure-ASCII, making any such transcoding a no-op.

In principle a user agent *can* submit non-ASCII characters in a query string without %-encoding them, but it's not standards-conformant and most browsers don't usually do it (exception: apparently curl as above), so it's not worth adding a layer of hopefully-fixing-but-potentially-mangling to this variable to support a situation that shouldn't arise for normal requests.

PATH_INFO only requires special handling because of the sad, sad historical artefact of the CGI spec requiring it to have URL-decoding applied to it at the gateway, thus making the non-ASCII characters pop out of the percentage woodwork.

@Graham can you share more about how those test results were generated and displayed? The Gunicorn results are about what I would expect - the double-decoding of PATH_INFO is arguably undesirable when curl submits raw bytes, but ultimately that's an unspecified situation so I don't really case.

The output from Apache, on the other hand, is odd - something appears to have mangled the results at the reporting stage as not only is there double-decoding but also some double-backslashes. It looks like the strings have been put through ascii(repr()) or something?

----------
nosy: +Andrew Clover

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue16679>
_______________________________________


More information about the docs mailing list