creating size-limited tar files

Alexander Blinne news at blinne.net
Wed Nov 7 20:05:30 CET 2012


I don't know the best way to find the current size, I only have a
general remark.
This solution is not so good if you have to impose a hard limit on the
resulting file size. You could end up having a tar file of size "limit +
size of biggest file - 1 + overhead" in the worst case if the tar is at
limit - 1 and the next file is the biggest file. Of course that may be
acceptable in many cases or it may be acceptable to do something about
it by adjusting the limit.

My Idea:
Assuming tar_file works on some object with a file-like interface one
could implement a "transparent splitting file" class which would have to
use some kind of buffering mechanism. It would represent a virtual big
file that is stored in many pieces of fixed size (except the last) and
would allow you to just add all files to one tar_file and have it split
up transparently by the underlying file-object, something like

tar_file = TarFile(SplittingFile(names='archiv.tar-%03d', chunksize=
chunksize, mode='wb'))
while remaining_files:
    tar_file.addfile(remaining_files.pop())

and the splitting_file would automatically create chunks with size
chunksize and filenames archiv.tar-001, archiv.tar-002, ...

The same class could be used to put it back together, it may even
implement transparent seeking over a set of pieces of a big file. I
would like to have such a class around for general usage.

greetings


More information about the Python-list mailing list