
I'm hoping to add BZIP2 compression to zipfile for 2.5. My primary motivation is that Project Gutenberg seems to be starting to use BZIP2 compression for some of its zips. What other wish list things do people around here have for zipfile? I thought I'd collect input here and make a PEP. I can open a pseudo-file for STORED files in binary read mode, for example, to allow reading zip-in-zip files without fully occupying memory. -- Scott David Daniels Scott.Daniels@Acm.Org

Scott David Daniels <Scott.Daniels@Acm.Org> wrote:
I'm hoping to add BZIP2 compression to zipfile for 2.5. My primary motivation is that Project Gutenberg seems to be starting to use BZIP2 compression for some of its zips. What other wish list things do people around here have for zipfile? I thought I'd collect input here and make a PEP.
I can open a pseudo-file for STORED files in binary read mode, for example, to allow reading zip-in-zip files without fully occupying memory.
I'm not sure that zipfile needs BZIP2 support...being that there is a bzip2 module. - Josiah

On Dec 27, 2004, at 8:43 PM, Josiah Carlson wrote:
Scott David Daniels <Scott.Daniels@Acm.Org> wrote:
I'm hoping to add BZIP2 compression to zipfile for 2.5. My primary motivation is that Project Gutenberg seems to be starting to use BZIP2 compression for some of its zips. What other wish list things do people around here have for zipfile? I thought I'd collect input here and make a PEP.
I can open a pseudo-file for STORED files in binary read mode, for example, to allow reading zip-in-zip files without fully occupying memory.
I'm not sure that zipfile needs BZIP2 support...being that there is a bzip2 module.
Note that the bzip2 module is named bz2 and does provide a file-like-interface. Also, it is implemented entirely as a C extension (ick). -bob

Josiah Carlson wrote:
Scott David Daniels <Scott.Daniels@Acm.Org> wrote:
I'm hoping to add BZIP2 compression to zipfile for 2.5. My primary motivation is that Project Gutenberg seems to be starting to use BZIP2 compression for some of its zips. What other wish list things do people around here have for zipfile? I thought I'd collect input here and make a PEP.
I can open a pseudo-file for STORED files in binary read mode, for example, to allow reading zip-in-zip files without fully occupying memory.
I'm not sure that zipfile needs BZIP2 support...being that there is a bzip2 module.
- Josiah
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gman...
But if you look at the zipfile document, BZIP2 is a compression technique you can use (per file) in a zip archive. In fact, I use bz2 to compress/decompress, but the data still needs to inhabit the archive. -- -- Scott David Daniels Scott.Daniels@Acm.Org

Scott David Daniels wrote:
I'm hoping to add BZIP2 compression to zipfile for 2.5. My primary motivation is that Project Gutenberg seems to be starting to use BZIP2 compression for some of its zips. What other wish list things do people around here have for zipfile? I thought I'd collect input here and make a PEP.
AFAIR, compression mechanisms are defined by numbers in the zip file. So you should not bother with such a change unless there is some "official" specification that explains how bzip2 is used in zipfiles. IOW, looking at http://www.pkware.com/company/standards/appnote/ you'll see that PKWARE has assigned algorithm 12 for bzip2. You might want to take a look at the spec to see what else the Python implementation lacks, and either document these features as deliberately missing, TODO, or just implement them right away. Regards, Martin

Scott David Daniels wrote:
I'm hoping to add BZIP2 compression to zipfile for 2.5. My primary motivation is that Project Gutenberg seems to be starting to use BZIP2 compression for some of its zips. What other wish list things do people around here have for zipfile? I thought I'd collect input here and make a PEP.
Encryption/decryption support. Will most likely require a C extension since the algorithm relies on ints (or longs, don't remember) wrapping around when the value becomes too large. -Brett

Encryption/decryption support. Will most likely require a C extension since the algorithm relies on ints (or longs, don't remember) wrapping around when the value becomes too large.
You may want to do this in C for speed, but C-style int wrapping is easily done by doing something like "x = x & 0xFFFFFFFFL" at crucial points in the code (for unsigned 32-bit ints) with an additional "if x & 0x80000000L: x -= 0x100000000L" to simulate signed 32-bit ints. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Brett C. wrote:
Scott David Daniels wrote:
I'm hoping to add BZIP2 compression to zipfile for 2.5. My primary motivation is that Project Gutenberg seems to be starting to use BZIP2 compression for some of its zips. What other wish list things do people around here have for zipfile? I thought I'd collect input here and make a PEP. Encryption/decryption support. Will most likely require a C extension since the algorithm relies on ints (or longs, don't remember) wrapping around when the value becomes too large.
I'm trying to use byte-block streams (iterators taking iterables) as the basic structure of getting data in and out. I think the encryption/ decryption can then be plugged in at the right point. If it can be set up properly, you can import the encryption separately and connect it to zipfiles with a call. Would this address what you want? I believe there is an issue actually building in the encryption/decryption in terms of redistribution. -- -- Scott David Daniels Scott.Daniels@Acm.Org

Scott David Daniels wrote:
Brett C. wrote:
Scott David Daniels wrote:
I'm hoping to add BZIP2 compression to zipfile for 2.5. My primary motivation is that Project Gutenberg seems to be starting to use BZIP2 compression for some of its zips. What other wish list things do people around here have for zipfile? I thought I'd collect input here and make a PEP.
Encryption/decryption support. Will most likely require a C extension since the algorithm relies on ints (or longs, don't remember) wrapping around when the value becomes too large.
I'm trying to use byte-block streams (iterators taking iterables) as the basic structure of getting data in and out. I think the encryption/ decryption can then be plugged in at the right point. If it can be set up properly, you can import the encryption separately and connect it to zipfiles with a call. Would this address what you want? I believe there is an issue actually building in the encryption/decryption in terms of redistribution.
Possibly. Encryption is part of the PKZIP spec so I was just thinking of covering that, not adding external encryption support. It really is not overly complex stuff, just will want to do it in C for speed probably as Guido suggested (but, as always, I would profile that first to see if performance is really that bad). -Brett

Scott David Daniels wrote:
I believe there is an issue actually building in the encryption/decryption in terms of redistribution.
Submitters should not worry about this too much. The issue primarily exists in the U.S., and there are now (U.S.) official procedures to deal with them, and the PSF can and does follow these procedures. Regards, Martin

Scott David Daniels wrote:
What other wish list things do people around here have for zipfile? I thought I'd collect input here and make a PEP.
I was working on a project based around modifying zip files, and found that python just doesn't implement that part. I'd like to see the ability to remove a file in the archive, as well as "write over" a file already in the archive. It's a tall order, but you asked. ;) Thanks, -Shane Holloway
participants (8)
-
"Martin v. Löwis"
-
Bob Ippolito
-
Brett C.
-
Brett C.
-
Guido van Rossum
-
Josiah Carlson
-
Scott David Daniels
-
Shane Holloway (IEEE)