tarfile and progress information

Lars Gustäbel lars at gustaebel.de
Wed Jul 14 04:16:45 EDT 2010


On Wed, Jul 07, 2010 at 11:23:22AM -0400, Nathan Huesken wrote:
> I am packing large files with tarfile. Is there any way I can get
> progress information while packing?

There is no builtin way in tarfile, but there are several possible solutions:

1. Replace the tarfile.copyfileobj() function that is used to copy the files'
contents to the archive with your own.

2. Subclass the TarFile class, replace the addfile() method, and wrap every
file object argument into an object that triggers a callback function for each
block of data being read:

"""
import sys
import os
import tarfile


class FileProxy(object):

    def __init__(self, fobj, callback):
        self.fobj = fobj
        self.callback = callback
        self.size = os.fstat(self.fobj.fileno()).st_size

    def read(self, size):
        self.callback(self.fobj.tell(), self.size)
        return self.fobj.read(size)

    def close(self):
        self.callback(self.size, self.size)
        self.fobj.close()


class MyTarFile(tarfile.TarFile):

    def __init__(self, *args, **kwargs):
        self.callback = kwargs.pop("callback")
        super(MyTarFile, self).__init__(*args, **kwargs)

    def addfile(self, tarinfo, fileobj=None):
        if self.callback is not None and fileobj is not None:
            fileobj = FileProxy(fileobj, self.callback)
        super(MyTarFile, self).addfile(tarinfo, fileobj)


def callback(processed, total):
    sys.stderr.write("%.1f%% \r" % (processed / float(total) * 100))


tar = MyTarFile.open("test.tar.gz", "w:gz", callback=callback)
tar.add("...")
tar.close()
"""

3. If you have a defined set of files to add you can do something like this:

tar = tarfile.open("test.tar.gz", "w:gz")
for filename in filenames:
    tarinfo = tar.gettarinfo(filename)
    fobj = FileProxy(open(filename, "rb"), callback)
    tar.addfile(tarinfo, fobj)
    fobj.close()


I hope this helps.

Regards,

-- 
Lars Gustäbel
lars at gustaebel.de

The power of accurate observation is called cynicism
by those who have not got it.
(George Bernard Shaw)



More information about the Python-list mailing list