question on using tarfile to read a *.tar.gzip file

m_ahlenius ahleniusm at gmail.com
Sun Feb 7 19:25:50 EST 2010


On Feb 7, 5:01 pm, Tim Chase <python.l... at tim.thechases.com> wrote:
> > Is there a way to do this, without decompressing each file to a temp
> > dir?  Like is there a method using some tarfile interface adapter to
> > read a compressed file?  Otherwise I'll just access each file, extract
> > it,  grab the 1st and last lines and then delete the temp file.
>
> I think you're looking for the extractfile() method of the
> TarFile object:
>
>    from glob import glob
>    from tarfile import TarFile
>    for fname in glob('*.tgz'):
>      print fname
>      tf = TarFile.gzopen(fname)
>      for ti in tf:
>        print ' %s' % ti.name
>        f = tf.extractfile(ti)
>        if not f: continue
>        fi = iter(f) # f doesn't natively support next()
>        first_line = fi.next()
>        for line in fi: pass
>        f.close()
>        print "  First line: %r" % first_line
>        print "  Last line: %r" % line
>      tf.close()
>
> If you just want the first & last lines, it's a little more
> complex if you don't want to scan the entire file (like I do with
> the for-loop), but the file-like object returned by extractfile()
> is documented as supporting seek() so you can skip to the end and
> then read backwards until you have sufficient lines.  I wrote a
> "get the last line of a large file using seeks from the EOF"
> function which you can find at [1] which should handle the odd
> edge cases of $BUFFER_SIZE containing more or less than a full
> line and then reading backwards in chunks (if needed) until you
> have one full line, handling a one-line file, and other
> odd/annoying edge-cases.  Hope it helps.
>
> -tkc
>
> [1]http://mail.python.org/pipermail/python-list/2009-January/1186176.html

Thanks Tim - this was very helpful.  Just learning about tarfile.

'mark



More information about the Python-list mailing list