[Patches] [ python-Patches-1446489 ] zipfile: support for ZIP64
SourceForge.net
noreply at sourceforge.net
Sun Jun 11 22:33:28 CEST 2006
Patches item #1446489, was opened at 2006-03-09 06:58
Message generated for change (Comment added) made by greg
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1446489&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: Ronald Oussoren (ronaldoussoren)
Assigned to: Ronald Oussoren (ronaldoussoren)
Summary: zipfile: support for ZIP64
Initial Comment:
The attached patch implements support for ZIP64, that is zipfiles
containing very large (>4GByte) files and zipfiles that are larger than
4GByte themselves.
The output of this patch can be read by pkzip (see below for the actual
version I used for testing).
----------------------------------------------------------------------
>Comment By: Gregory P. Smith (greg)
Date: 2006-06-11 13:33
Message:
Logged In: YES
user_id=413
reading zipfile64-version64.patch:
* why does the zipfile module import itself?
* Why is the default ZIP64 limit 1 << 30? shouldn't that be
1 << 31 - 1 (or slightly less) for maximum compatibility on
existing <2GiB zip files or zips with data just under 2GiB.
Don't force zip64's use unless the size actually exceeds a
32bit signed integer.
* assert diskno == 0 and assert nodisks == 1 should be
turned into BadZipFile exceptions with an explanation that
multi-disk zip files aren't supported.
* in main() document the -t option in the usage string.
* TestZip64InSmallFiles changes zipfile.ZIP64_LIMIT but will
not restore the value if a test fails (that could lead to
other unrelated test failures). not a problem in the
hopefully normal case of all tests passing. use a try:
finally: to make sure that gets reset.
* documentation: "Is does optionally handle" is awkward.
how about "It can handle"
The removal of the file_offset attribute makes sense but
does make me wonder how much existing code that could break.
I suggest leaving file_offset out and if any python 2.5
beta tester complains, restoring it or making scanning to
look file offsets up a ZipFile option (defaulting to True).
----------------------------------------------------------------------
Comment By: Ronald Oussoren (ronaldoussoren)
Date: 2006-05-30 06:28
Message:
Logged In: YES
user_id=580910
I've added some more tests for pre-existing functionality. The unittests are still
far from comprehensive, but at least touch upon most functionality of zipfile.
Does anyone feel like reviewing this? I'd like to get this into python2.5.
----------------------------------------------------------------------
Comment By: Ronald Oussoren (ronaldoussoren)
Date: 2006-05-26 01:26
Message:
Logged In: YES
user_id=580910
I've attached yet another version, this version reintroduces some functionalitity
that was unintentionally removed and fixes a lame bug that caused
test_zipimport to fail.
----------------------------------------------------------------------
Comment By: Ronald Oussoren (ronaldoussoren)
Date: 2006-05-23 06:10
Message:
Logged In: YES
user_id=580910
I've found some time to work on this. I've added zipfile-zip64-
version2.patch, this version:
* Makes zip64 behaviour optional (defaults to off because zip(1) doesn't
support zip64)
* Is significantly faster for large zipfiles because it doesn't scan the entire
zipfile just to check that the file headers are consistent with the central
directory w.r.t. filename (this check is now done when trying to read a file)
* Updates the reference documentation.
* Adds unittests. There are two sets of tests: one set tests the behaviour of
zip64 extensions using small files by lowering the zip64 cutoff point and is
run every time, the other set do tests with huge zipfiles and are run when the
largefile feature is enabled when running the tests.
There one backward incompatible change: ZipInfo objects no longer have a
file_offset attribute. That was the other reason for scanning the entire zipfile
when opening it. IMNSHO this should have been a private attribute and the
cost of this feature is not worth its *very* limited usefulness. As an indication
of its cost: I got a 6x speedup when I removed the calculation of the
file_offset attribute, something that adds up when you are dealing with huge
zipfiles (I wrote this patch because I'm dealing with 10+GByte zipfiles with
tens of thousands of files at work).
I noticed that zipfile raises RuntimeError in some places. I've changed one of
those to zipfile.BadZipfile, but others remain. I don't like this, most of them
should be replaced by TypeError or ValueError exceptions.
BTW. This patch also supports storing files >4GByte in the zipfile, but that
feature isn't very useful because zipfile doesn't have an API for reading file
data incrementally.
----------------------------------------------------------------------
Comment By: Ronald Oussoren (ronaldoussoren)
Date: 2006-05-16 00:55
Message:
Logged In: YES
user_id=580910
I haven't had time to work on this, all time I had to work on python related stuff
has been eaten by finishing PyObjC's port to intel macs and universal binary
patches.
The former is now done, the latter almost so I'll have some time to work on this
again especially because I'm using this patch at work and might be able to claim
some time to work on this during work-hours.
----------------------------------------------------------------------
Comment By: Georg Brandl (gbrandl)
Date: 2006-05-16 00:41
Message:
Logged In: YES
user_id=849994
Since 2.5 beta is coming close, have you made progress on
the tests/docs?
----------------------------------------------------------------------
Comment By: Ronald Oussoren (ronaldoussoren)
Date: 2006-04-02 12:13
Message:
Logged In: YES
user_id=580910
The "don't use the ZIP64 extension" flag is a good idea, zipfiles that use this
extension aren't readable by the infozip tools (zip and unzip on most unix
systems).
I'll add tests and documentation in the near future.
The version of zipfile that I'm currently using also contains a patch for
speeding up the opening of zipfiles, for the type of files I'm dealing with
(about 11GByte large with tens of thousands of files) the speedup is very
significant. I suppose it's better to file that as a separate patch after this has
been approved.
----------------------------------------------------------------------
Comment By: Anthony Baxter (anthonybaxter)
Date: 2006-04-01 21:02
Message:
Logged In: YES
user_id=29957
I'd like to see a testcase and possibly a note for the
documentation about the new semantics. Also, should it be
possible to say "don't use the ZIP64 extension, instead
raise an Error" for people who don't want to generate these?
----------------------------------------------------------------------
Comment By: Ronald Oussoren (ronaldoussoren)
Date: 2006-03-09 07:28
Message:
Logged In: YES
user_id=580910
Oops, I've uploaded the wrong file. zipfile-zip64.patch is the correct one.
I've tested the correctness of created archives using this version of pkzip:
pkzipc -version
PKZIP(R) Server Version 8 ZIP Compression Utility for Linux X86
Copyright (C) 1989-2005 PKWARE, Inc. All Rights Reserved. Evaluation
Version
PKZIP Reg. U.S. Pat. and Tm. Off. Patent No. 5,051,745
Patent Pending
Version 8.40.66
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1446489&group_id=5470
More information about the Patches
mailing list