Unable to read large files from zip

Nick Craig-Wood nick at craig-wood.com
Wed Aug 29 07:30:08 EDT 2007


Kevin Ar18 <kevinar18 at hotmail.com> wrote:
> 
>  I posted this on the forum, but nobody seems to know the solution: http://python-forum.org/py/viewtopic.php?t=5230
> 
>  I have a zip file that is several GB in size, and one of the files inside of it is several GB in size.  When it comes time to read the 5+GB file from inside the zip file, it fails with the following error:
>  File "...\zipfile.py", line 491, in read bytes = self.fp.read(zinfo.compress_size)
>  OverflowError: long it too large to convert to int

That will be an number which is bigger than 2**31 == 2 GB which can't
be converted to an int.

It would be explained if zinfo.compress_size is > 2GB, eg

  >>> f=open("z")
  >>> f.read(2**31)
  Traceback (most recent call last):
    File "<stdin>", line 1, in ?
  OverflowError: long int too large to convert to int

However it would seem nuts that zipfile is trying to read > 2GB into
memory at once!

>  There have been one or more posts about 2GB limits with the zipfile
>  module, as well as this bug report:
>  http://bugs.python.org/issue1189216 Also, older zip formats have a
>  4GB limit.  However, I can't say for sure what the problem is.
>  Does anyone know if my code is wrong

Your code looks OK to me.

>  or if there is a problem with Python itself?

Looks likely.

>  If Python has a bug in it

...then you have the source and you can have a go at fixing it!

Try editing zipfile.py and getting it to print out some debug info and
see if you can fix the problem.  When you have done submit the patch
to the python bug tracker and you'll get that nice glow from helping
others! Remember python is open source and is made by *us* for *us* :-)

If you need help fixing zipfile.py then you'd probably be better off
asking on python-dev.

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list