[Python-Dev] urllib.request.urlopen struggling in Windows 7
Thom Ives
thom.ives at hp.com
Tue Nov 15 00:31:12 CET 2011
Previously, in python 2.6, I had made a lot of use of urllib.urlopen to capture
web page content and then post process the data from the site I was downloading.
Now, those routines, and the new routines I am trying to use for python 3.2 are
running into what seems to be a windows only (maybe even windows 7 only problem).
Using the following code with python 3.2.2 (64) on windows 7 ...
import urllib.request
fp = urllib.request.urlopen(URL_string_that_I_use)
string = fp.read()
fp.close()
print(string.decode("utf8"))
I get the following message:
Traceback (most recent call last):
File "TATest.py", line 5, in <module>
string = fp.read()
File "d:\python32\lib\http\client.py", line 489, in read
return self._read_chunked(amt)
File "d:\python32\lib\http\client.py", line 553, in _read_chunked
self._safe_read(2) # toss the CRLF at the end of the chunk
File "d:\python32\lib\http\client.py", line 592, in _safe_read
raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)
Using the following code instead ...
import urllib.request
fp = urllib.request.urlopen(URL_string_that_I_use)
for Line in fp:
print(Line.decode("utf8").rstrip('\n'))
fp.close()
I get a fair amount of the web page's content, but then the rest of the capture
is thwarted by ...
Traceback (most recent call last):
File "TATest.py", line 9, in <module>
for Line in fp:
File "d:\python32\lib\http\client.py", line 489, in read
return self._read_chunked(amt)
File "d:\python32\lib\http\client.py", line 545, in _read_chunked
self._safe_read(2) # toss the CRLF at the end of the chunk
File "d:\python32\lib\http\client.py", line 592, in _safe_read
raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)
Trying to read another page yields ...
Traceback (most recent call last):
File "TATest.py", line 11, in <module>
print(Line.decode("utf8").rstrip('\n'))
File "d:\python32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position
21: character maps to <undefined>
I do believe this is a windows issue, but can python be made more robust to deal
with what is causing it? When trying similar code on Linux, we do not encounter
the problem.
More information about the Python-Dev
mailing list