cannot pickle large numpy objects when memory resources are already stressed
Hello, After running a simulation that took 6 days to complete, my script proceeded to attempt to write the results out to a file, pickled. The operation failed even though there was 1G of RAM free (4G machine). I've since reconsidered using the pickle format for storing data sets that include large numpy arrays. However, somehow I assumed that one would be able to pickle anything that you already had in memory, but I see now that this was a rash assumption. Ought there to be a way to do this, or should I forget about being able to bundle large numpy arrays and other objects in a single pickle? Thanks, Glen (these commands performed on a different machine with 1G RAM) In [10]: za = numpy.zeros( (100000000,), dtype=numpy.float32 ) In [11]: import cPickle In [12]: zfile = file( '/tmp/zfile', 'w' ) In [13]: cPickle.dump( za, zfile ) --------------------------------------------------------------------------- exceptions.MemoryError Traceback (most recent call last)
Glen W. Mabey wrote:
Hello,
After running a simulation that took 6 days to complete, my script proceeded to attempt to write the results out to a file, pickled.
The operation failed even though there was 1G of RAM free (4G machine). I've since reconsidered using the pickle format for storing data sets that include large numpy arrays. However, somehow I assumed that one would be able to pickle anything that you already had in memory, but I see now that this was a rash assumption.
Ought there to be a way to do this, or should I forget about being able to bundle large numpy arrays and other objects in a single pickle?
Thanks, Glen
(these commands performed on a different machine with 1G RAM)
In [10]: za = numpy.zeros( (100000000,), dtype=numpy.float32 )
In [11]: import cPickle
In [12]: zfile = file( '/tmp/zfile', 'w' )
In [13]: cPickle.dump( za, zfile ) ---------------------------------------------------------------------------
The pickle operation requires making a string from the data in memory before it is written out. This is a limitation of the pickle format, as far as I've been able to figure out. Perhaps when the new bytes type is added to Python we will have a way to view a memory area as a bytes object and be able to make a pickle without creating that extra copy in memory. -Travis
On Wed, Mar 14, 2007 at 09:46:46AM -0700, Travis Oliphant wrote:
Perhaps when the new bytes type is added to Python we will have a way to view a memory area as a bytes object and be able to make a pickle without creating that extra copy in memory.
Perhaps this is an aspect that could be mentioned in the PEP as motivation? It seems kind of minor though; there certainly are more compelling justifications for the PEP. Thanks, Glen
El dc 14 de 03 del 2007 a les 09:46 -0700, en/na Travis Oliphant va escriure:
Glen W. Mabey wrote:
Hello,
After running a simulation that took 6 days to complete, my script proceeded to attempt to write the results out to a file, pickled.
The operation failed even though there was 1G of RAM free (4G machine). I've since reconsidered using the pickle format for storing data sets that include large numpy arrays. However, somehow I assumed that one would be able to pickle anything that you already had in memory, but I see now that this was a rash assumption.
Ought there to be a way to do this, or should I forget about being able to bundle large numpy arrays and other objects in a single pickle?
If you can afford using another package for doing I/O perhaps PyTables can save your day. It is optimized for saving a retrieving very large amounts of data with ease. In particular, it can save your in-memory arrays without a need to do another copy in memory (provided the array is contiguous). It also allows compressing the data in a transparent way, without a need of using additional memory. Furthermore, a recent optimization introduced in the 2.0 branch a week ago also allows to *update* an array on disk without doing copies neither. HTH, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth
participants (3)
-
Francesc Altet
-
Glen W. Mabey
-
Travis Oliphant