Internationalized domain names not working with URLopen
John Nagle
nagle at animats.com
Wed Jun 13 02:17:32 EDT 2012
I'm trying to open
http://пример.испытание
with
urllib2.urlopen(s1)
in Python 2.7 on Windows 7. This produces a Unicode exception:
>>> s1
u'http://\u043f\u0440\u0438\u043c\u0435\u0440.\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435'
>>> fd = urllib2.urlopen(s1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\python27\lib\urllib2.py", line 394, in open
response = self._open(req, data)
File "C:\python27\lib\urllib2.py", line 412, in _open
'_open', req)
File "C:\python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\python27\lib\urllib2.py", line 1199, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\python27\lib\urllib2.py", line 1168, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "C:\python27\lib\httplib.py", line 955, in request
self._send_request(method, url, body, headers)
File "C:\python27\lib\httplib.py", line 988, in _send_request
self.putheader(hdr, value)
File "C:\python27\lib\httplib.py", line 935, in putheader
hdr = '%s: %s' % (header, '\r\n\t'.join([str(v) for v in values]))
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-5: ordinal not in range(128)
>>>
The HTTP library is trying to put the URL in the header as ASCII. Why
isn't "urllib2" handling that?
What does "urllib2" want? Percent escapes? Punycode?
John Nagle
More information about the Python-list
mailing list