[Python-Dev] urllib.quote and unquote - Unicode issues

Matt Giuca matt.giuca at gmail.com
Wed Aug 6 01:03:06 CEST 2008


> After the most recent flurry of discussion I've lost track of what's
> the right thing to do. I also believe it was said it should wait until
> 2.7/3.0, so there's no hurry (in fact there's no way to check it -- we
> don't have branches for those versions yet).
>

I assume you mean 2.7/3.1.

I've always been concerned with the suggestion that this wait till 3.1. I
figure this patch is going to change the documented behaviour of these
functions, so it might be unacceptable to change it after 3.0 is released.
It seems logical that this patch be part of the
"incompatible-for-the-sake-of-fixing-things" set of changes in 3.0.

The current behaviour is broken. Any code which uses quote to produce a URL,
then unquotes the same URL later will simply break for characters outside
the Latin-1 range. This is evident in the SimpleHTTPServer class as I said
above (which presents users with URLs for the files in a directory using
quote, then gives 404 when they click on them, because unquote can't handle
it). And it will break any user's code which also assumes unquote is the
inverse of quote.

We could hack a fix into SimpleHTTPServer and expect other users to do the
same (along the lines of .encode('utf-8').decode('latin-1')), but then those
hacks will break when we apply the patch in 3.1 because they abuse Unicode
strings, and we'll have to have another debate about how to be backwards
compatible with them. (The patched version is largely compatible with the
2.x version, but the unpatched version isn't compatible with either the 2.x
version or the patched version).

Surely the sane option is to get this UTF-8 patch into version 3.0 so we
don't have to support this bug into the future? I'm far less concerned about
the decision with regards to unquote_to_bytes/quote_from_bytes, as those are
new features which can wait.

Matt Giuca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20080806/922163fb/attachment-0001.htm>


More information about the Python-Dev mailing list