question on using tarfile to read a *.tar.gzip file
Tim Chase
python.list at tim.thechases.com
Sun Feb 7 18:01:24 EST 2010
> Is there a way to do this, without decompressing each file to a temp
> dir? Like is there a method using some tarfile interface adapter to
> read a compressed file? Otherwise I'll just access each file, extract
> it, grab the 1st and last lines and then delete the temp file.
I think you're looking for the extractfile() method of the
TarFile object:
from glob import glob
from tarfile import TarFile
for fname in glob('*.tgz'):
print fname
tf = TarFile.gzopen(fname)
for ti in tf:
print ' %s' % ti.name
f = tf.extractfile(ti)
if not f: continue
fi = iter(f) # f doesn't natively support next()
first_line = fi.next()
for line in fi: pass
f.close()
print " First line: %r" % first_line
print " Last line: %r" % line
tf.close()
If you just want the first & last lines, it's a little more
complex if you don't want to scan the entire file (like I do with
the for-loop), but the file-like object returned by extractfile()
is documented as supporting seek() so you can skip to the end and
then read backwards until you have sufficient lines. I wrote a
"get the last line of a large file using seeks from the EOF"
function which you can find at [1] which should handle the odd
edge cases of $BUFFER_SIZE containing more or less than a full
line and then reading backwards in chunks (if needed) until you
have one full line, handling a one-line file, and other
odd/annoying edge-cases. Hope it helps.
-tkc
[1]
http://mail.python.org/pipermail/python-list/2009-January/1186176.html
More information about the Python-list
mailing list