[Python-bugs-list] [ python-Bugs-467924 ] Improve the ZipFile Interface

SourceForge.net noreply@sourceforge.net
Thu, 31 Jul 2003 07:22:19 -0700


Bugs item #467924, was opened at 2001-10-04 11:54
Message generated for change (Comment added) made by mzimmerman
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=467924&group_id=5470

Category: Python Library
Group: Feature Request
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Improve the ZipFile Interface

Initial Comment:
There exist two methods to write to a ZipFile

     write(self, filename, arcname=None, compress_type=None)  
     writestr(self, zinfo, bytes)

but only one to read from it

     read(self, name)

Additionally, the two 'write's behave differently with respect to compression.

---
(a) 'read' does not fit to 'write', since 'write' takes a file and adds it to a ZipFile, 
     but 'read' is not the reverse operation. 'read' should be called 'readstr' since it 
     much better matches to 'writestr'.

(b) It is confusing what 'write' and 'read' actually mean. Does 'write' write a file, 
     or into the ZipFile? It would be more obvious if ZipFile has 4 methods which 
     pair-wise fit together:

     writestr (self, zinfo, bytes)
          # same as now
     readstr (self, name)
          # returns bytes (as string), currently called 'read'
          # 'read' could still live but should be deprecated
     add (self, filename, arcname=None, compress_type=None)
          # currently 'write'
          # 'write' could still live but should be deprecated
     extract (self, name, filename, arcname=None)
          # new, desired functionality

(c) BOTH, 'writestr' and 'add' should by default use the 'compress_type' that was 
     passed to the constructor of 'ZipFile'. Currently, 'write' does it, 'writestr' via 
     zinfo does it not. 'ZipInfo' sets the compression strict to 'ZIP_STORED' :-( 
     It should not do that! It rather should:
     - allow more parameters in the signature of the constructor
        to also pass the compression type (and some other attributes, too)
     - default to 'None', so that 'writestr' can see this, and then take 
        the default from the 'ZipFile' instance.





----------------------------------------------------------------------

Comment By: Matt Zimmerman (mzimmerman)
Date: 2003-07-31 10:22

Message:
Logged In: YES 
user_id=196786

It would also be very useful to be able to have ZipFile
read/write the uncompressed file data from/to a file-like
object, instead of just strings and files (respectively).

I would like to use this module to work with zip files
containing large files, but this is unworkable because the
current implementation would use excessive amounts of memory.

Currently, read() reads all of the compressed data into
memory, then uncompresses it into memory.  For files which
may be hundreds of megabytes compressed, this is undesirable.

Likewise for write(), I would like to be able to stream data
into a zip file, passing in a ZipInfo to specify the
metadata as is done with writestr().

The implementation of this functionality is quite
straightforward, but I am not sure whether (or how) the
interface should change.  Some other parts of the library
allow for a file object to be passed to the same interface
which accepts a filename.  The object is examined to see if
it has the necessary read/write methods and if not, it is
assumed to be a filename.  Would this be the correct way to
do it?

I, too, am a bit irked by the lack of symmetry exhibited by
read vs. write/writestr, and think that the interface
suggested above would be a significant improvement.

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2003-01-05 15:54

Message:
Logged In: YES 
user_id=92689

In Python 2.3, writestr() has an enhanced signature: the
first arg may now also be an archive name, in which case the
correct default settings are used (ie. the compression value
is taken from the file). See patch #651621.

extract() could be moderately useful (although I don't
understand the 'arcname' arg, how's that different from
'name'?) but would have to deal with file modes (bin/text).
The file mode isn't in the archive so would have to
(optionally) be supplied by the caller.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=467924&group_id=5470