[Python-Dev] Lack of sequential decompression in the zipfile module

Michele Simionato michele.simionato at gmail.com
Sat Feb 17 08:10:13 CET 2007


Derek Shockey <derek.shockey <at> gmail.com> writes:

> 
> Though I am an avid Python programmer, I've never forayed into the area of
developing Python itself, so I'm not exactly sure how all this works.I was
confused (and somewhat disturbed) to discover recently that the zipfile module
offers only one-shot decompression of files, accessible only via the read()
method. It is my understanding that the module will handle files of up to 4 GB
in size, and the idea of decompressing 4 GB directly into memory makes me a
little queasy. Other related modules (zlib, tarfile, gzip, bzip2) all offer
sequential decompression, but this does not seem to be the case for zipfile
(even though the underlying zlib makes it easy to do).
> Since I was writing a script to work with potentially very large zipped files,
I took it upon myself to write an extract() method for zipfile, which is
essentially an adaption of the read() method modeled after tarfile's extract().
I feel that this is something that should really be provided in the zipfile
module to make it more usable. I'm wondering if this has been discussed before,
or if anyone has ever viewed this as a problem. I can post the code I wrote as a
patch, though I'm not sure if my file IO handling is as robust as it needs to be
for the stdlib. I'd appreciate any insight into the issue or direction on where
I might proceed from here so as to fix what I see as a significant problem.

This is definitely a significant problem. We had to face it at work, and
at the end we decided to use zipstream
(http://www.doxdesk.com/software/py/zipstream.html) instead of zipfile,
but of course having the functionality in the standard library would be
much better.

 Michele Simionato




More information about the Python-Dev mailing list