Mailman 3 Copying zlib compression objects - Python-Dev

Feb. 17, 2006

      I'm writing a program in python that creates tar files of a certain
maximum size (to fit onto CD/DVD).  One of the problems I'm running
into is that when using compression, it's pretty much impossible to
determine if a file, once added to an archive, will cause the archive
size to exceed the maximum size.

I believe that to do this properly, you need to copy the state of tar
file (basically the current file offset as well as the state of the
compression object), then add the file.  If the new size of the archive
exceeds the maximum, you need to restore the original state.

The critical part is being able to copy the compression object.
Without compression it is trivial to determine if a given file will
"fit" inside the archive.  When using compression, the compression
ratio of a file depends partially on all the data that has been
compressed prior to it.

The current implementation in the standard library does not allow you
to copy these compression objects in a useful way, so I've made some
minor modifications (patch attached) to the standard 2.4.2 library:
- Add copy() method to zlib compression object.  This returns a new
compression object with the same internal state.  I named it copy() to
keep it consistent with things like sha.copy().
- Add snapshot() / restore() methods to GzipFile and TarFile.  These
work only in write mode.  snapshot() returns a state object.  Passing
in this state object to restore() will restore the state of the
GzipFile / TarFile to the state represented by the object.

Future work:
- Decompression objects could use a copy() method too
- Add support for copying bzip2 compression objects

Although this patch isn't complete, does this seem like a good approach?

Cheers,
Chris

Copying zlib compression objects

Chris AtLee

Guido van Rossum

Chris AtLee

tags

participants (2)