Deflate with urllib2... (solved)
gagsl-py2 at yahoo.com.ar
Fri Sep 19 17:25:09 CEST 2008
En Thu, 18 Sep 2008 23:29:30 -0300, Sam <samslists at gmail.com> escribió:
> On Sep 18, 2:10 pm, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
>> En Tue, 16 Sep 2008 21:58:31 -0300, Sam <samsli... at gmail.com> escribió:
>> The code is correct - try with another server. I tested it with a
>> LightHTTPd server and worked fine.
> I found a bunch of servers to test it on. It fails on every server I
> could find (sans one).
> Here's the ones it fails on:
> I did manage to find one webserver it succeeded on---that is
> kenrockwel.com --- a domain squatter with a typoed domain of one of my
> favorite photographer's websites (the actual website should be
> This squatter's site is indeed running lighttpd---but it appears to be
> an earlier version, because the official lighttpd site fails on this
> We have all the major web servers failing the test:
> * Apache 1.3
> * Apache 2.2
> * Microsoft-IIS/6.0
> * lighttpd/1.5.0
> So I think it's the python side that is wrong, regardless of what the
> standard is.
I've found the problem. The zlib header is missing (2 bytes), data begins
right with the compressed stream. You may decode it if you pass a negative
value for wsize:
data = zlib.decompress(data)
data = zlib.decompress(data, -zlib.MAX_WBITS)
Note that this is clearly in violation of RFC 1950: the header is *not*
BTW, the curl developers had this same problem some time ago
<http://curl.haxx.se/mail/lib-2005-12/0130.html> and the proposed solution
is the same as above.
This is the output from your test script modified as above. (Note that in
some cases, the compressed stream is larger than the uncompressed data):
http://slashdot.org - Apache/1.3.41 (Unix) mod_perl/1.31-rc4 (deflate)
Able to decompress...went from 73174 to 73073.
http://www.hotmail.com - Microsoft-IIS/6.0 (deflate) len(deflate)=1609
Able to decompress...went from 1609 to 3969.
http://www.godaddy.com - Microsoft-IIS/6.0 (deflate) len(deflate)=40646
Able to decompress...went from 40646 to 157141.
http://www.linux.com - Apache/2.2.8 (Unix) PHP/5.2.5 (deflate)
Able to decompress...went from 52862 to 52786.
http://www.lighttpd.net - lighttpd/1.5.0 (deflate) len(deflate)=5669
Able to decompress...went from 5669 to 15746.
http://www.kenrockwel.com - lighttpd (deflate) len(deflate)=414
Able to decompress...went from 414 to 744.
More information about the Python-list