[Python-Dev] urllib.quote and unicode bug resuscitation attempt
John J Lee
jjl at pobox.com
Tue Jul 11 20:43:22 CEST 2006
On Tue, 11 Jul 2006, Stefan Rank wrote:
> urllib.quote fails on unicode strings and in an unhelpful way::
[...]
> >>> urllib.quote(u'a\xf1a')
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "C:\Python24\lib\urllib.py", line 1117, in quote
> res = map(safe_map.__getitem__, s)
> KeyError: u'\xf1'
More helpful than silently producing the wrong answer.
[...]
> I suggest to add (after 2.5 I assume) one of the following to the
> beginning of urllib.quote to either fail early and consistently on
> unicode arguments and improve the error message::
>
> if isinstance(s, unicode):
> raise TypeError("quote needs a byte string argument, not unicode,"
> " use `argument.encode('utf-8')` first.")
Won't this break existing code that catches the KeyError, for no big
benefit? If nobody is yet sure what the Right Thing is (see below), I
think we should not change this yet.
> or to do The Right Thing (tm), which is utf-8 encoding::
>
> if isinstance(s, unicode):
> s = s.encode('utf-8')
>
> as suggested in
> http://www.w3.org/International/O-URL-code.html
> and rfc3986.
You seem quite confident of that. You may be correct, but have you read
all of the following? (not trying to claim superior knowledge by asking
that, I just dunno what the right thing is yet: I haven't yet read RFC
2617 or got my head around what the unicode issues are or how they should
apply to the Python stdlib)
http://www.ietf.org/rfc/rfc2617.txt
http://www.ietf.org/rfc/rfc2616.txt
http://en.wikipedia.org/wiki/Percent-encoding
http://mail.python.org/pipermail/python-dev/2004-September/048944.html
Also note the recent discussions here about a module named "uriparse" or
"urischemes", which fits in to this somewhere. It would be good to make
all the following changes in a single Python release (2.6, with luck):
- extend / modify urllib and urllib2 to handle unicode input
- address the urllib.quote issue you raise above (+ consider the other
utility functions in that module)
- add the urischemes module
In summary, I agree that your suggested fix (and all of the rest I refer
to above) should wait for 2.6, unless somebody (Martin?) who understands
all these issues is quite confident your suggested change is OK.
Presumably the release managers wouldn't allow it in 2.5 anyway.
John
More information about the Python-Dev
mailing list