[Web-SIG] Python 3: Form data encoding issues in cgi and urllib modules

Miles Kaufmann milesck at umich.edu
Thu Apr 16 00:26:47 CEST 2009


On Wed, Apr 15, 2009 at 5:23 PM, Graham Dumpleton wrote:
> 2009/4/16 Miles Kaufmann <milesck at umich.edu>:
>> So: does anyone agree, or disagree, that cgi.FieldStorage should be
>> changed to take byte streams, and many of the cgi and urllib.parse
>> functions should become encoding-aware, preferably in time for Python
>> 3.1?  The byte-stream change will break compatibility with with Python
>> 3.0, but I strongly feel that treating POST data as text is wrong and
>> should not continue to be supported.
>
> Have you read:
>
>  http://bugs.python.org/issue3300
>
> This was referenced in a prior post here and is likely relevant. A lot
> of the discussion for that was happening on developers list for Python
> 3.0.

I hadn't. Thanks for the link! That was a long read, so apologies if I
missed anything, but that discussion seems to pertain almost entirely
to the urllib.parse.[un]quote* functions; there was only one point
where it was mentioned that there would be issues with non-UTF-8 data
for higher-level functions[1], and nothing followed from that.

I don't think it should be a controversial move to add encoding and
errors parameters to the following functions:

* urllib.parse.parse_qs
* urllib.parse.parse_qsl
* urllib.parse.urlencode

which, I feel, would be in line with the outcome of the discussion you
referenced, shouldn't break any existing code, and would make it
possible to parse the "quite prevalent"[2] instances of non-utf-8
query strings like the following:

'premier=un&deuxi%E8me=deux' # latin-1

The parameters would also need to be added to cgi.parse,
cgi.parse_multipart, and cgi.FieldStorage, if they were in fact
changed to expect a bytes file input, as I suggest.

> Not sure why someone was taking issue with WEB-SIG list over cgi
> FieldStorage issues as I don't recollect us having any substantive
> discussion about it and any problems it has.

Exactly; that person's issue was that there hasn't been substantive
discussion.  Which is what I'm trying to create now. :)

-Miles Kaufmann

[1]: http://bugs.python.org/msg70970
[2]: http://lists.w3.org/Archives/Public/www-international/2008JulSep/0042.html


More information about the Web-SIG mailing list