[Python-Dev] Lack of sequential decompression in the zipfile module

Fri Feb 16 20:53:20 CET 2007

Though I am an avid Python programmer, I've never forayed into the area of
developing Python itself, so I'm not exactly sure how all this works.

I was confused (and somewhat disturbed) to discover recently that the
zipfile module offers only one-shot decompression of files, accessible only
via the read() method. It is my understanding that the module will handle
files of up to 4 GB in size, and the idea of decompressing 4 GB directly
into memory makes me a little queasy. Other related modules (zlib, tarfile,
gzip, bzip2) all offer sequential decompression, but this does not seem to
be the case for zipfile (even though the underlying zlib makes it easy to
do).

Since I was writing a script to work with potentially very large zipped
files, I took it upon myself to write an extract() method for zipfile, which
is essentially an adaption of the read() method modeled after tarfile's
extract(). I feel that this is something that should really be provided in
the zipfile module to make it more usable. I'm wondering if this has been
discussed before, or if anyone has ever viewed this as a problem. I can post
the code I wrote as a patch, though I'm not sure if my file IO handling is
as robust as it needs to be for the stdlib. I'd appreciate any insight into
the issue or direction on where I might proceed from here so as to fix what
I see as a significant problem.

Thanks,
Derek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20070216/58127a75/attachment.htm