[Python-Dev] Re: [Pythonmac-SIG] zipfile still has 2GB boundary
bob at redivi.com
Wed Apr 27 03:00:43 CEST 2005
On Apr 26, 2005, at 8:24 PM, Guido van Rossum wrote:
>>> Someone should think about rewriting the zipfile module to be less
>>> hideous, include a repair feature, and be up to date with the latest
>>> specifications <http://www.pkware.com/company/standards/appnote/>.
>> -- and allow *deleting* a file from a zipfile. As far as I can tell,
>> you now can't (except by rewriting everything but that to a new
>> and renaming). Somewhere I saw a patch request for this, but it was
>> languishing, a year or more old. Or am I just totally missing
> Please don't propose a grand rewrite (even it's only a single module).
> Given that the API is mostly sensible, please propose gradual
> refactoring of the implementation, perhaps some new API methods, and
> so on. Don't throw away the work that went into making it work in the
> first place!
Well, I didn't necessarily mean it should be thrown away and started
from scratch -- however, once you get all the ugly out of it, there's
not much left! Obviously there's something wrong with the way it's
written if it took years and *several passes* to correctly identify and
fix a simple format character case bug. Most of this can be blamed on
the struct module, which is more obscure and error-prone than writing
the same code in C.
One of the most useful things that could happen to the zipfile module
would be a stream interface for both reading and writing. Right now
it's slow and memory hungry when dealing with large chunks. The use
case that lead me to fix this bug is a tool that archives video to zip
files of targa sequences with a reference QuickTime movie.. so I end up
with thousands of bite sized chunks.
This >2GB bug really caused me some grief in that I didn't test with
such large sequences because I didn't have any. I didn't end up
finding out about it until months later because client *ignored* the
exceptions raised by the GUI and came back to me with broken zip files.
Fortunately the TOC in a zip file can be reconstructed from an
otherwise pristine stream. Of course, I had to rewrite half of the
zipfile module to come up with such a recovery program, because it's
not designed well enough to let me build such a tool on top of it.
Another "bug" I ran into was that it has some crazy default for the
ZipInfo record: it assumes the platform ("create_system") is Windows
regardless of where you are! This caused some really subtle and
annoying issues with some unzip tools (of course, on everyone's
machines except mine). Fortunately someone was able to figure out why
and send me a patch, but it was completely unexpected and I didn't see
such craziness documented anywhere. If it weren't for this patch, it'd
either still be broken, or I'd have switched to some other way of
The zipfile module is good enough to create input files for zipimport..
which is well tested and generally works -- barring the fact that
zipimport has quite a few rough edges of its own. I certainly wouldn't
recommend it for any heavy duty tasks in its current state.
More information about the Python-Dev