How to process a very large (4Gb) tarfile from python?

Lars Gustäbel lars at gustaebel.de
Thu Jul 17 22:21:34 CEST 2008


On Thu, Jul 17, 2008 at 10:39:23AM -0700, Uwe Schmitt wrote:
> On 17 Jul., 17:55, Terry Carroll <carr... at nospam-tjc.com> wrote:
> > On Thu, 17 Jul 2008 06:14:45 -0700 (PDT), Uwe Schmitt
> >
> > <rocksportroc... at googlemail.com> wrote:
> > >I had a look at tarfile.py in my current Python 2.5 installations
> > >lib path. The iterator caches TarInfo objects in a list
> > >tf.members . If you only want to iterate and you  are not interested
> > >in more functionallity, you could use "tf.members=[]" inside
> > >your loop. This is a dirty hack !
> >
> > Thanks, Uwe.  That works fine for me.  It now reads through all 2.5
> > million members, in about 30 minutes, never going above a 4M working
> > set.
> 
> Maybe we should post this issue to python-dev mailing list.
> Parsing large tar-files is not uncommon.

This issue is known and was fixed for Python 3.0, see
http://bugs.python.org/issue2058.

-- 
Lars Gustäbel
lars at gustaebel.de

Es genügt nicht nur, keine Gedanken zu haben,
man muß auch unfähig sein, sie auszudrücken.
(anonym)



More information about the Python-list mailing list