[Python-Dev] zipfile still has 2GB boundary bug

Bob Ippolito bob at redivi.com
Mon Apr 25 04:32:35 CEST 2005


The "2GB bug" that was supposed to be fixed in 
<http://python.org/sf/679953> was not actually fixed.  The zipinfo 
offsets in the structures are still signed longs, so the fix allows you 
to write one file that extends past the 2G boundary, but if any extend 
past that point you are screwed.

I have opened a new bug and patch that should fix this issue 
<http://python.org/sf/1189216>.  This is a backport candidate to 2.4.2 
and 2.3.6 (if that ever happens).

On a related note, if anyone else has a bunch of really big and 
ostensibly broken zip archives created by dumb versions of the zipfile 
module, I have written a script that can rebuild the central directory 
in-place.  Ping me off-list if you're interested and I'll clean it up.

Someone should think about rewriting the zipfile module to be less 
hideous, include a repair feature, and be up to date with the latest 
specifications <http://www.pkware.com/company/standards/appnote/>.

Additionally, it'd also be useful if someone were to include support 
for Apple's "extensions" to the zip format (the __MACOSX folder and its 
contents) that show up when BOM (private framework) is used to create 
archives (i.e. Finder in Mac OS X 10.3+).  I'm not sure if these are 
documented anywhere, but I can help with reverse engineering if someone 
is interested in writing the code.

On that note, Mac OS X 10.4 (Tiger) is supposed to have new APIs (or 
changes to existing APIs?) to facilitate resource fork preservation, 
ACLs, and Spotlight hooks in tar, cp, mv, etc.  Someone should spend 
some time looking at the Darwin 8 sources for these tools (when they're 
publicly available in the next few weeks) to see what would need to be 
done in Python to support them in the standard library (the os, 
tarfile, etc. modules).

-bob



More information about the Python-Dev mailing list