[Python-Dev] Re: [Pythonmac-SIG] zipfile still has 2GB boundary bug

Bob Ippolito bob at redivi.com
Wed Apr 27 03:00:43 CEST 2005


On Apr 26, 2005, at 8:24 PM, Guido van Rossum wrote:

>>> Someone should think about rewriting the zipfile module to be less
>>> hideous, include a repair feature, and be up to date with the latest
>>> specifications <http://www.pkware.com/company/standards/appnote/>.
>>
>> -- and allow *deleting* a file from a zipfile. As far as I can tell,
>> you now can't (except by rewriting everything but that to a new 
>> zipfile
>> and renaming). Somewhere I saw a patch request for this, but it was
>> languishing, a year or more old. Or am I just totally missing
>> something?
>
> Please don't propose a grand rewrite (even it's only a single module).
> Given that the API is mostly sensible, please propose gradual
> refactoring of the implementation, perhaps some new API methods, and
> so on. Don't throw away the work that went into making it work in the
> first place!

Well, I didn't necessarily mean it should be thrown away and started 
from scratch -- however, once you get all the ugly out of it, there's 
not much left!  Obviously there's something wrong with the way it's 
written if it took years and *several passes* to correctly identify and 
fix a simple format character case bug.  Most of this can be blamed on 
the struct module, which is more obscure and error-prone than writing 
the same code in C.

One of the most useful things that could happen to the zipfile module 
would be a stream interface for both reading and writing.  Right now 
it's slow and memory hungry when dealing with large chunks.  The use 
case that lead me to fix this bug is a tool that archives video to zip 
files of targa sequences with a reference QuickTime movie.. so I end up 
with thousands of bite sized chunks.

This >2GB bug really caused me some grief in that I didn't test with 
such large sequences because I didn't have any.  I didn't end up 
finding out about it until months later because client *ignored* the 
exceptions raised by the GUI and came back to me with broken zip files. 
  Fortunately the TOC in a zip file can be reconstructed from an 
otherwise pristine stream.  Of course, I had to rewrite half of the 
zipfile module to come up with such a recovery program, because it's 
not designed well enough to let me build such a tool on top of it.

Another "bug" I ran into was that it has some crazy default for the 
ZipInfo record: it assumes the platform ("create_system") is Windows 
regardless of where you are!  This caused some really subtle and 
annoying issues with some unzip tools (of course, on everyone's 
machines except mine).  Fortunately someone was able to figure out why 
and send me a patch, but it was completely unexpected and I didn't see 
such craziness documented anywhere.  If it weren't for this patch, it'd 
either still be broken, or I'd have switched to some other way of 
creating archives!

The zipfile module is good enough to create input files for zipimport.. 
which is well tested and generally works -- barring the fact that 
zipimport has quite a few rough edges of its own.  I certainly wouldn't 
recommend it for any heavy duty tasks in its current state.

-bob



More information about the Python-Dev mailing list