UnicodeDecodeError having fetch web page

Kushal Kumaran kushal.kumaran+python at gmail.com
Thu May 27 01:00:14 EDT 2010


On Wed, May 26, 2010 at 11:40 PM, Rob Williscroft <rtw at rtw.me.uk> wrote:
> Kushal Kumaran wrote in news:1274889564.2339.16.camel at nitrogen in
> gmane.comp.python.general:
>
>> On Tue, 2010-05-25 at 20:12 +0000, Rob Williscroft wrote:
>>> Barry wrote in news:83dc485a-5a20-403b-99ee-c8c627bdbab3
>>> @m21g2000vbr.googlegroups.com in gmane.comp.python.general:
>>>
>>> > Hi,
>>> >
>>> > The code below is giving me the error:
>>> >
>>> > Traceback (most recent call last):
>>> >   File "C:\Users\Administratör\Desktop\test.py", line 4, in
>>> >   <module>
>>> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position
>>> > 1: unexpected code byte
>>> >
>>> >
>>> > What am i doing wrong?
>>>
>>> It may not be you, en.wiktionary.org is sending gzip
>>> encoded content back, it seems to do this even if you set
>>> the Accept header as in:
>>>
>>> request.add_header( "Accept", "text/html" )
>>>
>>> But maybe I'm not doing it correctly.
>>>
>> You need the Accept-Encoding: identity header.
>> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html
>
> Thanks, following this I did change the line to be:
>
> request.add_header( "Accept-Encoding", "identity" )
>
> but it made no difference to en.wiktionary.org it just sent the
> back a gzip encoded response.
>

A known problem, I guess... https://bugzilla.wikimedia.org/show_bug.cgi?id=7098

You'll just have to handle the gzipped data.

-- 
regards,
kushal



More information about the Python-list mailing list