removing the header from a gzip'd string
Fredrik Lundh
fredrik at pythonware.com
Sun Dec 24 06:15:34 EST 2006
debarchana.ghosh at gmail.com wrote:
> Essentially, they note that the NCD does not always bevave like a
> metric and one reason they put forward is that this may be due to the
> size of the header portion (they were using the command line gzip and
> bzip2 programs) compared to the strings being compressed (which are on
> average 48 bytes long).
gzip datastreams have a real header, with a file type identifier,
optional filenames, comments, and a bunch of flags.
but even if you strip that off (which is basically what happens if you
use zlib.compress instead of gzip), I doubt you'll get representative
"compressability" metrics on strings that short. like most other
compression algorithms, those algorithms are designed for much larger
datasets.
</F>
More information about the Python-list
mailing list