[New-bugs-announce] [issue3991] urllib.request.urlopen does not handle non-ASCII characters

Toshio Kuratomi report at bugs.python.org
Sun Sep 28 20:47:16 CEST 2008


New submission from Toshio Kuratomi <a.badger at gmail.com>:

Tested on python-3.0rc1 -- Linux Fedora 9

I wanted to make sure that python3.0 would handle url's in different
encodings.  So I created two files on an apache server which were named
½ñ.html.  One of the filenames was encoded in utf-8 and the other in
latin-1.  Then I tried the following::

from urllib.request import urlopen
url = 'http://localhost/u/½ñ.html'
urlopen(url.encode('utf-8')).read()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.0/urllib/request.py", line 122, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python3.0/urllib/request.py", line 350, in open
    req.timeout = timeout
AttributeError: 'bytes' object has no attribute 'timeout'

The same thing happens if I give None for the two optional arguments
(data and timeout).

Next I tried using a raw Unicode string:

>>> urlopen(url).read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.0/urllib/request.py", line 122, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python3.0/urllib/request.py", line 359, in open
    response = self._open(req, data)
  File "/usr/lib/python3.0/urllib/request.py", line 377, in _open
    '_open', req)
  File "/usr/lib/python3.0/urllib/request.py", line 337, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.0/urllib/request.py", line 1082, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.0/urllib/request.py", line 1068, in do_open
    h.request(req.get_method(), req.get_selector(), req.data, headers)
  File "/usr/lib/python3.0/http/client.py", line 843, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.0/http/client.py", line 860, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/lib/python3.0/http/client.py", line 751, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position
7-8: ordinal not in range(128)

So, in python-3.0rc1, this method is badly broken.

----------
components: Unicode
messages: 73982
nosy: a.badger
severity: normal
status: open
title: urllib.request.urlopen does not handle non-ASCII characters
versions: Python 3.0

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3991>
_______________________________________


More information about the New-bugs-announce mailing list