zipped socket
Bryan Olson
fakeaddress at nowhere.org
Tue Aug 9 21:30:56 EDT 2005
jepler at unpythonic.net wrote:
> As far as I know, there is not a prefabbed solution for this problem.
One
> issue that you must solve is the issue of buffering (when must some
data you've
> written to the compressor really go out to the other side) and the
issue of
> what to do when a read() or recv() reads gzipped bytes but these
don't produce any
> additional unzipped bytes---this is a problem because normally a
read() that
> returns '' indicates end-of-file.
>
> If you only work with whole files at a time, then one easy thing to
do is use
> the 'zlib' encoding:
> >>> "abc".encode("zlib")
> "x\x9cKLJ\x06\x00\x02M\x01'"
> >>> _.decode("zlib")
> 'abc'
> ... but because zlib isn't self-delimiting, this won't work if you
want to
> write() multiple times, or if you want to read() less than the full file
That's basically a solved problem; zlib does have a kind of
self-delimiting. The key is the 'flush' method of the
compression object:
some_send_function( compressor.flush(Z_SYNC_FLUSH) )
The Python module doc is unclear/wrong on this, but zlib.h
explains:
If the parameter flush is set to Z_SYNC_FLUSH, all pending
output is flushed to the output buffer and the output is
aligned on a byte boundary, so that the decompressor can get
all input data available so far.
There's also Z_FULL_FLUSH, which also re-sets the compression
dictionary. For a stream socket, we'd usually want to keep the
dictionary, since that's what gives us the compression. The
Python doc states:
Z_SYNC_FLUSH and Z_FULL_FLUSH allow compressing further
strings of data and are used to allow partial error recovery
on decompression
That's not correct. Z_FULL_FLUSH allows recovery after errors,
but Z_SYNC_FLUSH is just to allow pushing all the compressor's
input to the decompressor's output.
--
--Bryan
More information about the Python-list
mailing list