[issue3991] urllib.request.urlopen does not handle non-ASCII characters

Daniel Diniz report at bugs.python.org
Sun Feb 8 22:50:20 CET 2009


Daniel Diniz <ajaksu at gmail.com> added the comment:

I think Toshio's usecase is important enough to deserve a fix (patch
attached) or a special-cased error message. IMO, newbies trying to fix
failures from urlopen may have a hard time figuring out the maze:

urlopen -> _opener -> open -> _open -> _call_chain -> http_open -> 
do_open (and that's before leaving urllib!).

>>> from urllib.request import urlopen
>>> url = 'http://localhost/ñ.html'
>>> urlopen(url).read()
Traceback (most recent call last):
[...]
UnicodeEncodeError: 'ascii' codec can't encode character '\xf1' in
position 5: ordinal not in range(128)


If the newbie isn't completely lost by then, how about:
>>> from urllib.parse import quote
>>> urlopen(quote(url)).read()
Traceback (most recent call last):
[...]
ValueError: unknown url type: http%3A//localhost/%C3%B1.html

----------
keywords: +patch
nosy: +ajaksu2
Added file: http://bugs.python.org/file12986/non_ascii_path.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3991>
_______________________________________


More information about the Python-bugs-list mailing list