[Patches] [ python-Patches-1121142 ] ZipFile.open - read-only file-like obj for files in archive

SourceForge.net noreply at sourceforge.net
Sun Feb 18 19:41:43 CET 2007


Patches item #1121142, was opened at 2005-02-11 19:08
Message generated for change (Comment added) made by alanmcintyre
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1121142&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
Status: Open
Resolution: Out of Date
Priority: 5
Private: No
Submitted By: Alan McIntyre (alanmcintyre)
Assigned to: Nobody/Anonymous (nobody)
Summary: ZipFile.open - read-only file-like obj for files in archive

Initial Comment:
I originally started working on updating patch 992750,
but decided after a short while to just start from
scratch, so I'm posting it as a new patch.  Sorry if
this isn't appropriate.

This patch provides a new open() method on ZipFile;
this method returns a file-like object for the
requested item in the archive.  This file-like object
only provides a read() method.

ZipFile.read was modified to use the new open method
(this was suggested by loewis in reference to patch
992750).

The patched zipfile.py passed the existing tests in the
test_zipfile.py from CVS.  New tests were added to
verify the operation of the object returned by open().
These tests were modeled after existing tests for
ZipFile.read(); two read fixed-size chunks from the
file-like object, and two others read random-sized chunks.

I have only run the tests on Windows XP, using
Python2.4 from the official Windows installer.  I will
test the patch out on Linux over the weekend.

If the patch is accepted I'll also generate and submit
patches for the appropriate documentation as well.

----------------------------------------------------------------------

>Comment By: Alan McIntyre (alanmcintyre)
Date: 2007-02-18 13:41

Message:
Logged In: YES 
user_id=1115903
Originator: YES

Here is a version of the patch against the current trunk.  It passes a
"make test" on my Gentoo laptop at the moment, but I think I still need to
add some more tests (like reading from encrypted compressed files), and I
need to try out the test_zipfile64 stuff.  I don't have time today to look
through the module Glyph suggested, so it may be that after I do that I'll
have some more tweaks to make.  I'm pretty sure I'll be able to spend some
more time on Wednesday this week.  

I also included the full contents of zipfile.py and test_zipfile.py just
in case that's useful to somebody.  
File Added: zipfile_patch6.tgz

----------------------------------------------------------------------

Comment By: Alan McIntyre (alanmcintyre)
Date: 2007-02-18 11:44

Message:
Logged In: YES 
user_id=1115903
Originator: YES

Thanks, glyph; I'll definitely have a look and see what I can steal. :-)

----------------------------------------------------------------------

Comment By: Glyph Lefkowitz (glyf)
Date: 2007-02-16 22:48

Message:
Logged In: YES 
user_id=226807
Originator: NO

Twisted also contains an implementation of this functionality, available
from 

   
http://twistedmatrix.com/trac/browser/trunk/twisted/python/zipstream.py

As far as I can tell it doesn't have anything to recommend it over the
attached patch, however (in fact, the test coverage of the attached patch
looks better), but perhaps it has some behavior which might be desirable
to steal.


----------------------------------------------------------------------

Comment By: Alan McIntyre (alanmcintyre)
Date: 2007-02-14 12:13

Message:
Logged In: YES 
user_id=1115903
Originator: YES

I will see if I can make some progress on this over the weekend.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2007-02-13 04:14

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please update the patch to the current subversion trunk?  I'd like
to apply it, but the code of zipfile has changed so that the patch is
currently out-of-date. When redoing it, notice that the read
implementation has changed (I couldn't figure out how you moved code
around).

Please do use the trunk (not Python 2.5), and please submit the patch as a
single 'svn diff' output (rather than a tar file containing multiple
individual diff files).

----------------------------------------------------------------------

Comment By: Alan McIntyre (alanmcintyre)
Date: 2005-05-30 21:56

Message:
Logged In: YES 
user_id=1115903

Revision 5 of this patch has been in constant use with
Python 2.4.1 in an application at my job for about a month;
it seems to be stable and useful in that regard.  If anybody
has time to review and accept (or reject) it I would
appreciate it.

----------------------------------------------------------------------

Comment By: Alan McIntyre (alanmcintyre)
Date: 2005-04-26 21:23

Message:
Logged In: YES 
user_id=1115903

After testing on my large batch of large Zip files, I made
one fix (version 4 of the patch didn't read all the content
of large compressed archive items).  The current set of
changes is attached as zipfile_patch5.tgz.

----------------------------------------------------------------------

Comment By: Alan McIntyre (alanmcintyre)
Date: 2005-04-13 11:34

Message:
Logged In: YES 
user_id=1115903

I found a problem with my universal newline handling code in
readline(); if the first byte of an '\r\n' pair was read
from file but the second byte didn't come in on that same
read, it resulted in the next line having an inappropriate
'\n' prepended to it.  zipfile_patch4.tgz has a fix for this
included (with everything else, of course).

I'm going to test the open() capability in a real
application with some reasonably large zip files (containing
files up to just short of 2GB) over the next few days, so if
any bugs or performance problems show up I may have some
more changes.

----------------------------------------------------------------------

Comment By: Alan McIntyre (alanmcintyre)
Date: 2005-04-13 00:58

Message:
Logged In: YES 
user_id=1115903

Uploaded the third revision of this patch; passes all
regression tests against current CVS on WinXP.  I think all
the issues Martin brought up have been addressed except
perhaps for the case of compression rate <1.  I will have a
look at that when I have time; just wanted to get an update
here before the patch started to look abandoned. :)

----------------------------------------------------------------------

Comment By: Alan McIntyre (alanmcintyre)
Date: 2005-03-14 09:37

Message:
Logged In: YES 
user_id=1115903

Hmm...I could have sworn I did the diff in the correct
order. I'll make sure next time. :)

Here's my comments on your remarks (in order):
- I'm adding support for universal newlines, and will reject
all modes that aren't legal combinations of r, U, and b.
- I'll see if I can make a Zip file store something with
compression < 1, and if I can I'll add a test case for it.
- I'll try work a .flush() in there on the compression
object and come up with a test case if possible
- .read(0) and .readline(0) will both return an empty string
with no side-effects, since this seems to be what builtin
files do.  

Right now ZipExtFile uses the ZipFile's file object, so you
pretty much have to do whatever you're going to do with the
ZipExtFile instance you get back from .open() before you use
that ZipFile instance in a way that moves the file pointer
around. 

I'm sure that somebody will eventually try to use the
ZipFile in this way, so it appears my options are either to
(1) give the ZipExtFile its own file object to use (when
possible), or (2) make sure this limitation is documented. 
#1 feels (to me) to be the "right thing" to do, so that's
what I'll try unless there's a good reason I shouldn't.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2005-03-01 02:59

Message:
Logged In: YES 
user_id=21627

The patch is reversed: usually, diff is invoked as "-c old new".

I think it is almost right, but I have a few remarks:
- by tradition, open() should have a mode argument,
defaulting to 'r'; it would be ok to raise exceptions if it
is anything else. However, do consider implementing
universal newlines; allowing 'b' as a no-op might also be
reasonable.
- I wonder what happens if the compression rate is < 1. It
would appear that the code might use too few rawbytes. I
suggest to recursively invoke read in this case.
- I wonder whether it could ever happen that there is still
data to uncompress in the zlib object, ie. whether it might
be necessary to invoke .flush() after exhausting the
rawbytes (and discard the zlib object afterwards)
- it appears that atleast the builtin file object implements
.read(0) as returning an empty string; the manual says that
the entire file is meant only if size is omitted or negative.

----------------------------------------------------------------------

Comment By: Alan McIntyre (alanmcintyre)
Date: 2005-02-27 01:28

Message:
Logged In: YES 
user_id=1115903

zipfile_patch2.tgz: I updated the file-like object to
support readline, readlines, __iter__ and next().  Added
tests for those new methods.  Also added a patch for the
documentation.

Passes regression tests on 2.5a0 built from CVS HEAD with
MSVC .NET on Windows XP.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1121142&group_id=5470


More information about the Patches mailing list