Deflate with urllib2... (solved)

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Fri Sep 19 11:25:09 EDT 2008


En Thu, 18 Sep 2008 23:29:30 -0300, Sam <samslists at gmail.com> escribió:

> On Sep 18, 2:10 pm, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> wrote:
>> En Tue, 16 Sep 2008 21:58:31 -0300, Sam <samsli... at gmail.com> escribió:
>> The code is correct - try with another server. I tested it with a  
>> LightHTTPd server and worked fine.
>
> Gabriel...
>
> I found a bunch of servers to test it on.  It fails on every server I
> could find (sans one).
>
> Here's the ones it fails on:
> slashdot.org
> hotmail.com
> godaddy.com
> linux.com
> lighttpd.net
>
> I did manage to find one webserver it succeeded on---that is
> kenrockwel.com --- a domain squatter with a typoed domain of one of my
> favorite photographer's websites (the actual website should be
> kenrockwell.com)
>
> This squatter's site is indeed running lighttpd---but it appears to be
> an earlier version, because the official lighttpd site fails on this
> test.
>
> We have all the major web servers failing the test:
> * Apache 1.3
> * Apache 2.2
> * Microsoft-IIS/6.0
> * lighttpd/1.5.0
>
> So I think it's the python side that is wrong, regardless of what the
> standard is.

I've found the problem. The zlib header is missing (2 bytes), data begins  
right with the compressed stream. You may decode it if you pass a negative  
value for wsize:

       try:
         data = zlib.decompress(data)
       except zlib.error:
         data = zlib.decompress(data, -zlib.MAX_WBITS)

Note that this is clearly in violation of RFC 1950: the header is *not*  
optional.

BTW, the curl developers had this same problem some time ago  
<http://curl.haxx.se/mail/lib-2005-12/0130.html> and the proposed solution  
is the same as above.

This is the output from your test script modified as above. (Note that in  
some cases, the compressed stream is larger than the uncompressed data):

Trying:  http://slashdot.org
   http://slashdot.org - Apache/1.3.41 (Unix) mod_perl/1.31-rc4 (deflate)  
len(def
late)=73174 len(gzip)=73208
   Able to decompress...went from 73174 to 73073.

Trying:  http://www.hotmail.com
   http://www.hotmail.com - Microsoft-IIS/6.0 (deflate) len(deflate)=1609  
len(gzi
p)=1635
   Able to decompress...went from 1609 to 3969.

Trying:  http://www.godaddy.com
   http://www.godaddy.com - Microsoft-IIS/6.0 (deflate) len(deflate)=40646  
len(gz
ip)=157141
   Able to decompress...went from 40646 to 157141.

Trying:  http://www.linux.com
   http://www.linux.com - Apache/2.2.8 (Unix) PHP/5.2.5 (deflate)  
len(deflate)=52
862 len(gzip)=52880
   Able to decompress...went from 52862 to 52786.

Trying:  http://www.lighttpd.net
   http://www.lighttpd.net - lighttpd/1.5.0 (deflate) len(deflate)=5669  
len(gzip)
=5687
   Able to decompress...went from 5669 to 15746.

Trying:  http://www.kenrockwel.com
   http://www.kenrockwel.com - lighttpd (deflate) len(deflate)=414  
len(gzip)=426
   Able to decompress...went from 414 to 744.

-- 
Gabriel Genellina




More information about the Python-list mailing list