[Tutor] memory error files over 100MB

Mon Mar 16 17:30:51 CET 2009

Cheetah1000 wrote:
> I can't speak for Python 2.6, but using Jython 2.1 (Python 2.1 for Java),
> the code only looks at the file you are trying to extract()/read().  Near
> the end of the zip archive is a directory of all the files in the archive,
> with the start position and length of each file.  Jython's zipfile (written
> in python, naturally) only reads from the start position to the end of the
> file.  More information on this can be found by searching for 'Central
> Directory' of a zip archive.  Still doesn't explain the 100M problem though. 
> I've having the same issue extracting 20MB from a 22MB zip.  It could be
> Python; or it could be, in my case, a java issue.  

Zipped sizes are not really interesting.

Since you read the unzipped version into memory, how big is the data unzipped? 
That is the amount of storage that you need for just the file data.

To have a working program, you have to add the size of a running Python or 
Java program to that number.

I don't know what code is executed in an assignment exactly, but **possibly**, 
first the 'read()' is executed (thus loading a very big string into memory), 
before assigning the value to the variable (which releases the previous value 
of the variable).
That means that just after reading but before assigning, you **may** have two 
very big strings in memory that cannot be garbage collected.

All that data needs to be available in the memory of a single process at your 
machine.

As already discussed, the only proper way of dealing with such files is to 
split your read call into smaller blocks, so you can handle zip files up to 
disk capacity.

Sincerely,
Albert