Zlib: correct checksum but error decompressing

Wed Aug 26 19:53:28 EDT 2009

Paul Rubin <http> writes:

> 
> Andre <andre.cohen <at> gmail.com> writes:
> > I have been trying to solve this issue for a while now. I receive data
> > from a TCP connection which is compressed.
> 
> Are you sure it is compressed with zlib?  If yes, does it include the
> standard zlib header?  Some applications save a few bytes by stripping
> the header.  See the zlib doc page for how to deal with that, there is
> a flag that causes the header check to be skipped on decompression if
> you pass a negative number.  That's the first thing I would try.

Short answer:

Try this:
    zlib.decompress(incoming_data, -15)
If that doesn't work:
    print repr(incoming_data[:30])
    # post the results here

Longer answer:

A zlib stream consists of a deflate stream preceded by
a 2-byte header and followed by a 4-byte Adler32
checksum of the original data.

The problem occurs not out of a desire to save 6 bytes
but through compounding of 2 mistakes:

Mistake (1) is in the HTTP protocol.
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html
The "deflate" content coding should have been called "zlib".
Read this and weep:
"""deflate The "zlib" format defined in RFC 1950 [31] in
combination with the "deflate" compression mechanism
described in RFC 1951 [29]."""

Mistake (2) happens when software implementers read only
the first word of the above quote and provide only a
deflate stream.

A reader can handle both possibilities by checking for a
(usual, default) zlib header:

data[0] == '\x78' and (ord(data[1]) + 0x7800) % 31 == 0

HTH,
John