[Python-Dev] Re: [Pythonmac-SIG] zipfile still has 2GB boundary
Guido van Rossum
gvanrossum at gmail.com
Wed Apr 27 06:19:47 CEST 2005
> > Please don't propose a grand rewrite (even it's only a single module).
> > Given that the API is mostly sensible, please propose gradual
> > refactoring of the implementation, perhaps some new API methods, and
> > so on. Don't throw away the work that went into making it work in the
> > first place!
> Well, I didn't necessarily mean it should be thrown away and started
> from scratch
Well, you *did* say "rewrite". :-)
> -- however, once you get all the ugly out of it, there's
> not much left! Obviously there's something wrong with the way it's
> written if it took years and *several passes* to correctly identify and
> fix a simple format character case bug. Most of this can be blamed on
> the struct module, which is more obscure and error-prone than writing
> the same code in C.
I think the reason is different -- it just hasn't had all that much
use beyond the one use case for which it was written (zipping up the
Python library). Also, don't underestimate the baroqueness of the zip
> One of the most useful things that could happen to the zipfile module
> would be a stream interface for both reading and writing. Right now
> it's slow and memory hungry when dealing with large chunks. The use
> case that lead me to fix this bug is a tool that archives video to zip
> files of targa sequences with a reference QuickTime movie.. so I end up
> with thousands of bite sized chunks.
Sounds like a use case nobody else has tried yet.
> This >2GB bug really caused me some grief in that I didn't test with
> such large sequences because I didn't have any. I didn't end up
> finding out about it until months later because client *ignored* the
> exceptions raised by the GUI and came back to me with broken zip files.
> Fortunately the TOC in a zip file can be reconstructed from an
> otherwise pristine stream. Of course, I had to rewrite half of the
> zipfile module to come up with such a recovery program, because it's
> not designed well enough to let me build such a tool on top of it.
Given more typical use cases for zip files (sending around collections
of source files) I'm not surprised that a bug that only occurs for
files >2GB remained hidden for so long.
I don't remember if you have Python CVS permissions, but you sound
like you really know the module as well as the zip file spec, so I'm
hoping that you'll find the time to do some reconstructive surgery on
the zip module for Python 2.5, without breaking the existing APIs. I
like the idea you have for a stream API; I recall that the one time I
had to use it I was surprised that the API dealt with files as string
> Another "bug" I ran into was that it has some crazy default for the
> ZipInfo record: it assumes the platform ("create_system") is Windows
> regardless of where you are!
I vaguely recall that the initial author was a Windows-head; perhaps
he didn't realize how useful the module would be on other platforms,
or that it would make any difference at all.
> This caused some really subtle and
> annoying issues with some unzip tools (of course, on everyone's
> machines except mine). Fortunately someone was able to figure out why
> and send me a patch, but it was completely unexpected and I didn't see
> such craziness documented anywhere. If it weren't for this patch, it'd
> either still be broken, or I'd have switched to some other way of
> creating archives!
> The zipfile module is good enough to create input files for zipimport..
> which is well tested and generally works -- barring the fact that
> zipimport has quite a few rough edges of its own. I certainly wouldn't
> recommend it for any heavy duty tasks in its current state.
So, please fix it!
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev