[Numpy-discussion] cannot pickle large numpy objects when memory resources are already stressed

Wed Mar 14 12:46:46 EDT 2007

Glen W. Mabey wrote:

>Hello,
>
>After running a simulation that took 6 days to complete, my script
>proceeded to attempt to write the results out to a file, pickled.
>
>The operation failed even though there was 1G of RAM free (4G machine).  
>I've since reconsidered using the pickle format for storing data sets 
>that include large numpy arrays.  However, somehow I assumed that one
>would be able to pickle anything that you already had in memory, but I
>see now that this was a rash assumption.
>
>Ought there to be a way to do this, or should I forget about being able
>to bundle large numpy arrays and other objects in a single pickle?
>
>Thanks,
>Glen
>
>
>(these commands performed on a different machine with 1G RAM)
>
>In [10]: za = numpy.zeros( (100000000,), dtype=numpy.float32 )
>
>In [11]: import cPickle
>
>In [12]: zfile = file( '/tmp/zfile', 'w' )
>
>In [13]: cPickle.dump( za, zfile )
>---------------------------------------------------------------------------
>  
>
The pickle operation requires making a string from the data in memory 
before it is written out.  This is a limitation of the pickle format, as 
far as I've been able to figure out.

Perhaps when the new bytes type is added to Python we will have a way to 
view a memory area as a bytes object and be able to make a pickle 
without creating that extra copy in memory.

-Travis