[New-bugs-announce] [issue22248] urllib.request.urlopen raises exception when 30X-redirect url contains non-ascii chars

Tomas Groth report at bugs.python.org
Fri Aug 22 10:58:03 CEST 2014

New submission from Tomas Groth:

Running this simple test script produces the traceback show below.

import urllib.request
page = urllib.request.urlopen('http://legacy.biblegateway.com/versions/?vid=DN1933&action=getVersionInfo#books')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 461, in open
    response = meth(req, response)
  File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.4/urllib/request.py", line 493, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 676, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 455, in open
    response = self._open(req, data)
  File "/usr/lib/python3.4/urllib/request.py", line 473, in _open
    '_open', req)
  File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "/usr/lib/python3.4/http/client.py", line 1065, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.4/http/client.py", line 1093, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/lib/python3.4/http/client.py", line 957, in putrequest
UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: ordinal not in range(128)

Using curl we can see that there is a redirect to an url with a special char:
$ curl -vs "http://legacy.biblegateway.com/versions/?vid=DN1933&action=getVersionInfo#books" >DN1933
* Hostname was NOT found in DNS cache
*   Trying
* Connected to legacy.biblegateway.com ( port 80 (#0)
> GET /versions/?vid=DN1933&action=getVersionInfo HTTP/1.1
> User-Agent: curl/7.35.0
> Host: legacy.biblegateway.com
> Accept: */*
< HTTP/1.1 301 Moved Permanently
* Server nginx/1.4.7 is not blacklisted
< Server: nginx/1.4.7
< Date: Fri, 22 Aug 2014 08:35:30 GMT
< Content-Type: text/html; charset=UTF-8
< Content-Length: 0
< Connection: keep-alive
< X-Powered-By: PHP/5.5.7
< Set-Cookie: bg_id=1b9a80d5e6d545487cfd153d6df65c4e; path=/; domain=.biblegateway.com
< Set-Cookie: a9gl=0; path=/; domain=.biblegateway.com
< Location: http://legacy.biblegateway.com/versions/Dette-er-Biblen-på-dansk-1933/
* Connection #0 to host legacy.biblegateway.com left intact

When the redirect-url doesn't contain special chars everything works as expected, like with this url: "http://legacy.biblegateway.com/versions/?vid=DNB1930&action=getVersionInfo#books"

components: Library (Lib)
messages: 225651
nosy: tomasgroth
priority: normal
severity: normal
status: open
title: urllib.request.urlopen raises exception when 30X-redirect url contains non-ascii chars
type: behavior
versions: Python 3.4

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list