UnicodeDecodeError having fetch web page

Barry magnus.moraberg at gmail.com
Tue May 25 16:00:23 EDT 2010


On 25 Maj, 21:39, Philip Semanchuk <phi... at semanchuk.com> wrote:
> On May 25, 2010, at 3:13 PM, Barry wrote:
>
>
>
> > Hi,
>
> > The code below is giving me the error:
>
> > Traceback (most recent call last):
> >  File "C:\Users\Administratör\Desktop\test.py", line 4, in <module>
> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1:
> > unexpected code byte
>
> > What am i doing wrong?
>
> > Thanks,
>
> > Barry
>
> > request = urllib.request.Request(url='http://en.wiktionary.org/wiki/
> > baby',headers={'User-Agent':'Mozilla/5.0 (X11; U; Linux i686) Gecko/
> > 20071127 Firefox/2.0.0.11'} )
>
> > response = urllib.request.urlopen(request)
> > html = response.read().decode('utf-8')
>
> Well, for starters you're assuming that the response content is in  
> UTF-8. You need to examine the Content-Type header to see what the  
> encoding is. If it's not UTF-8, there's your problem.
>
> HTH
> P

The content type is utf-8:

Date: Wed, 19 May 2010 19:17:39 GMT
Server: Apache
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Content-Language: en
Vary: Accept-Encoding,Cookie
Last-Modified: Wed, 19 May 2010 10:10:34 GMT
Content-Encoding: gzip
Content-Length: 25247
Content-Type: text/html; charset=utf-8
X-Cache: HIT from sq61.wikimedia.org
X-Cache-Lookup: HIT from sq61.wikimedia.org:3128
Age: 520549
X-Cache: HIT from amssq32.esams.wikimedia.org
X-Cache-Lookup: HIT from amssq32.esams.wikimedia.org:3128
X-Cache: MISS from amssq37.esams.wikimedia.org
X-Cache-Lookup: MISS from amssq37.esams.wikimedia.org:80
Connection: close

Can it be that the page is corrupt? If so, how can I make the best of
the situation? Many other pages from this server work without problem.

Thanks!

Barry



More information about the Python-list mailing list