[ python-Feature Requests-467924 ] Improve the ZipFile Interface
SourceForge.net
noreply at sourceforge.net
Sun Sep 25 22:20:20 CEST 2005
Feature Requests item #467924, was opened at 2001-10-04 15:54
Message generated for change (Comment added) made by scott_daniels
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=467924&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Improve the ZipFile Interface
Initial Comment:
There exist two methods to write to a ZipFile
write(self, filename, arcname=None, compress_type=None)
writestr(self, zinfo, bytes)
but only one to read from it
read(self, name)
Additionally, the two 'write's behave differently with respect to compression.
---
(a) 'read' does not fit to 'write', since 'write' takes a file and adds it to a ZipFile,
but 'read' is not the reverse operation. 'read' should be called 'readstr' since it
much better matches to 'writestr'.
(b) It is confusing what 'write' and 'read' actually mean. Does 'write' write a file,
or into the ZipFile? It would be more obvious if ZipFile has 4 methods which
pair-wise fit together:
writestr (self, zinfo, bytes)
# same as now
readstr (self, name)
# returns bytes (as string), currently called 'read'
# 'read' could still live but should be deprecated
add (self, filename, arcname=None, compress_type=None)
# currently 'write'
# 'write' could still live but should be deprecated
extract (self, name, filename, arcname=None)
# new, desired functionality
(c) BOTH, 'writestr' and 'add' should by default use the 'compress_type' that was
passed to the constructor of 'ZipFile'. Currently, 'write' does it, 'writestr' via
zinfo does it not. 'ZipInfo' sets the compression strict to 'ZIP_STORED' :-(
It should not do that! It rather should:
- allow more parameters in the signature of the constructor
to also pass the compression type (and some other attributes, too)
- default to 'None', so that 'writestr' can see this, and then take
the default from the 'ZipFile' instance.
----------------------------------------------------------------------
Comment By: Scott David Daniels (scott_daniels)
Date: 2005-09-25 20:20
Message:
Logged In: YES
user_id=493818
I am currently working on an expanded zipfile module that:
(a) Has a more easily extensible class
(b) Allows BZIP2 compression (my orginal need)
(c) Allows file-like (read) access to the elements of ZipFile
(d) Provides for a single "writer" which can be used to
generate file contents "incrementally" while possibly
reading from other "files" in the zipfile
(e) Allows the opening of embedded zips "in-place"
What I don't have at the moment is a good set of tests
or good documents of how to use it. Anyone interested
in collaborating, let me know.
--Scott David Daniels
----------------------------------------------------------------------
Comment By: Chuck Rhode (crhode)
Date: 2005-09-22 15:56
Message:
Logged In: YES
user_id=988879
I've been trying to read map files put out by the Census
Bureau. These ZIP archives are downloaded from government
contractors' sites by county. Within each county archive
are several ZIP files for each map layer (roads, streams,
waterbodies, etc). Each contains the elements of an ESRI
shapefile database (.shp, .shx., and .dbf files). This
doesn't make a lot of sense to me, either, because there's
no compression advantage to making an archive of an archive.
The technique is used purely for organizational purposes
because ZIP does not compress subdirectories.
Note: I've never seen a TAR of TAR files because TAR *does*
compress subdirectories.
What I've been struggling with is a way to leave these
archives in their compressed form and still do *python* I/O
on them. There is a tree organization to them, after all,
just as with traditional os.path directories. I've designed
some objects that let me retrieve the most recent file, ZIP
member, or TAR member by name from a given path to a
repository of such archives. What I get is a StreamIO
object that I can subsequently put back where it came from.
What would be nice is if there already were objects
available to manipulate normal os.path directories comingled
with ZIP and TAR archives. What would be nicer is if I/O
could be opened at the character/line level transparently
without regard to whether the source/destination was a file
or an archive member within such a structure. In the days
of hardware compression and on-the-fly encryption/decryption
of I/O, is this too much to ask? -ccr-
----------------------------------------------------------------------
Comment By: Myers Carpenter (myers_carpenter)
Date: 2004-05-09 18:23
Message:
Logged In: YES
user_id=335935
The zipfile interface should match the tarfile interface.
At the mininum is should work for this example:
import zipfile
zip = zipfile.open("sample.zip", "r")
for zipinfo in zip:
print tarinfo.name, "is", tarinfo.size, "bytes in size
and is",
zip.extract(zipinfo)
zip.close()
This closely matchs the 'tarfile' module.
----------------------------------------------------------------------
Comment By: Matt Zimmerman (mzimmerman)
Date: 2003-07-31 14:22
Message:
Logged In: YES
user_id=196786
It would also be very useful to be able to have ZipFile
read/write the uncompressed file data from/to a file-like
object, instead of just strings and files (respectively).
I would like to use this module to work with zip files
containing large files, but this is unworkable because the
current implementation would use excessive amounts of memory.
Currently, read() reads all of the compressed data into
memory, then uncompresses it into memory. For files which
may be hundreds of megabytes compressed, this is undesirable.
Likewise for write(), I would like to be able to stream data
into a zip file, passing in a ZipInfo to specify the
metadata as is done with writestr().
The implementation of this functionality is quite
straightforward, but I am not sure whether (or how) the
interface should change. Some other parts of the library
allow for a file object to be passed to the same interface
which accepts a filename. The object is examined to see if
it has the necessary read/write methods and if not, it is
assumed to be a filename. Would this be the correct way to
do it?
I, too, am a bit irked by the lack of symmetry exhibited by
read vs. write/writestr, and think that the interface
suggested above would be a significant improvement.
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2003-01-05 20:54
Message:
Logged In: YES
user_id=92689
In Python 2.3, writestr() has an enhanced signature: the
first arg may now also be an archive name, in which case the
correct default settings are used (ie. the compression value
is taken from the file). See patch #651621.
extract() could be moderately useful (although I don't
understand the 'arcname' arg, how's that different from
'name'?) but would have to deal with file modes (bin/text).
The file mode isn't in the archive so would have to
(optionally) be supplied by the caller.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=467924&group_id=5470
More information about the Python-bugs-list
mailing list