Add a file to a compressed tarfile

Josiah Carlson jcarlson at uci.edu
Fri Nov 5 13:19:33 EST 2004


eddie at holyrood.ed.ac.uk (Eddie Corns) wrote:
> 
> Dennis Hotson <djdennie69 at hotmail.com> writes:
> 
> 
> >I'm currently trying to read all of the files inside the tarfile... then
> >writing them all back. Bit of a kludge, but it should work..
> 
> There isn't really any other way.  A tar file is terminated by two empty
> blocks.  In order to append to a tar file you simply append a new tar file two
> blocks from the end of the original.  If it was uncompressed you just seek
> back from the end and write but if it's compressed you can't find that point
> without decompressing[1].  In some cases a more time efficient but less space
> efficient method would be to just compress individual files in a directory and
> then tar them up before the final distribution (or whatever you do with your
> tar file)
> 
> Eddie
> 
> [1] I think, unless there's a clever way of just decompressing the last few
>     blocks.

I am not aware of any such method.  I am fairly certain gzip (and the
associated zlib) does the following:

while bytes remaining:
    reset/initialize state
    while state is not crappy and bytes remaining:
        compress portion of remaining bytes
        update state

Even if one could discover the last reset/initialization of state, one
would still need to decompress the data from then on in order to
discover the two empty blocks.

A 'resume compression friendly' algorithm would necessarily need to
describe its internal state at the end of the byte stream.  In the case
of gzip (or other similar compression algorithms), really the only way
this is reasonable is to just give an offset in the file to the last
reset/initialization.  Of course the internal state must still be
regenerated from the remaining portion of the file (which may be the
entire file), so isn't really a win over just processing the entire file
again with an algorithm that discovers when/where to pick up where it
left off before.

 - Josiah




More information about the Python-list mailing list