creating size-limited tar files

Andrea Crotti andrea.crotti.0 at
Wed Nov 7 22:52:18 CET 2012

On 11/07/2012 08:32 PM, Roy Smith wrote:
> In article <509ab0fa$0$6636$9b4e6d93 at>,
>   Alexander Blinne <news at> wrote:
>> I don't know the best way to find the current size, I only have a
>> general remark.
>> This solution is not so good if you have to impose a hard limit on the
>> resulting file size. You could end up having a tar file of size "limit +
>> size of biggest file - 1 + overhead" in the worst case if the tar is at
>> limit - 1 and the next file is the biggest file. Of course that may be
>> acceptable in many cases or it may be acceptable to do something about
>> it by adjusting the limit.
> If you truly have a hard limit, one possible solution would be to use
> tell() to checkpoint the growing archive after each addition.  If adding
> a new file unexpectedly causes you exceed your hard limit, you can
> seek() back to the previous spot and truncate the file there.
> Whether this is worth the effort is an exercise left for the reader.

So I'm not sure if it's an hard limit or not, but I'll check tomorrow.
But in general for the size I could also take the size of the files and 
simply estimate the size of all of them,
pushing as many as they should fit in a tarfile.
With compression I might get a much smaller file maybe, but it would be 
much easier..

But the other problem is that at the moment the people that get our 
chunks reassemble the file with a simple:

cat file1.tar.gz file2.tar.gz > file.tar.gz

which I suppose is not going to work if I create 2 different tar files, 
since it would recreate the header in all of the them, right?
So or I give also a script to reassemble everything or I have to split 
in a more "brutal" way..

Maybe after all doing the final split was not too bad, I'll first check 
if it's actually more expensive for the filesystem (which is very very slow)
or it's not a big deal...

More information about the Python-list mailing list