UnicodeDecodeError having fetch web page
Peter Otten
__peter__ at web.de
Tue May 25 16:10:38 EDT 2010
Barry wrote:
> On 25 Maj, 21:39, Philip Semanchuk <phi... at semanchuk.com> wrote:
>> On May 25, 2010, at 3:13 PM, Barry wrote:
>>
>>
>>
>> > Hi,
>>
>> > The code below is giving me the error:
>>
>> > Traceback (most recent call last):
>> > File "C:\Users\Administratör\Desktop\test.py", line 4, in <module>
>> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1:
>> > unexpected code byte
>>
>> > What am i doing wrong?
>>
>> > Thanks,
>>
>> > Barry
>>
>> > request = urllib.request.Request(url='http://en.wiktionary.org/wiki/
>> > baby',headers={'User-Agent':'Mozilla/5.0 (X11; U; Linux i686) Gecko/
>> > 20071127 Firefox/2.0.0.11'} )
>>
>> > response = urllib.request.urlopen(request)
>> > html = response.read().decode('utf-8')
>>
>> Well, for starters you're assuming that the response content is in
>> UTF-8. You need to examine the Content-Type header to see what the
>> encoding is. If it's not UTF-8, there's your problem.
>>
>> HTH
>> P
>
> The content type is utf-8:
>
> Date: Wed, 19 May 2010 19:17:39 GMT
> Server: Apache
> Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
> Content-Language: en
> Vary: Accept-Encoding,Cookie
> Last-Modified: Wed, 19 May 2010 10:10:34 GMT
> Content-Encoding: gzip
But the data is gzipped. You have to uncompress it before decoding.
Peter
More information about the Python-list
mailing list