Unpickling crashing my machine

Peter Otten __peter__ at web.de
Fri Jul 30 04:52:02 EDT 2004


Pierre-Frédéric Caillaud wrote:

> 
> No response, so I'm reposting this ad it seems an "interesting" problem...
> 
> I have a huge dataset which contains a lot of individual records
> represented by class instances.
> 
> I pickle this to a file :
> 
> way #1 :
> for object in objects :
> cPickle.dump( object, myfile, -1 )
> 
> way #2 :
> p = cPickle.Pickler( myfile, -1 )
> for object in objects :
> p.dump( object )
> 
> When I try to unpickle this big file :
> 
> p = cPickle.Unpickler( open( ... ))
> many times p.load()... display a progress counter...
> 
> Loading the file generated by #1 works fine, with linear speed.
> Loading the file generated by #2 :
> - the progress counter runs as fast as #1
> - eats all memory, then swap
> - when eating swap, the progress counter slows down a lot (of course)
> - and the process must be killed to save the machine.
> 
> I'm talking lots of memory here. The pickled file is about 80 MB, when
> loaded it fits into RAM no problem.
> However I killed the #2 when it had already hogged about 700 Mb of RAM,
> and showed no sign of wanting to stop.
> 
> What's the problem ?

I have just tried to pickle the same object twice using both methods you
describe. The file created using the Pickler is shorter than the one
written by dump(), which I suppose creates a new pickler for every call.
That means that the pickler keeps a cache of objects already written and
therefore the Unpickler must do the same. I believe that what you see is
the excessive growth of that cache.

Peter






More information about the Python-list mailing list