[Python-Dev] Unpickling memory usage problem, and a proposed solution
Dan Gindikin
dgindikin at gmail.com
Fri Apr 23 23:11:34 CEST 2010
Alexandre Vassalotti <alexandre <at> peadrop.com> writes:
>
> On Fri, Apr 23, 2010 at 3:57 PM, Dan Gindikin <dgindikin <at> gmail.com> wrote:
> > This wouldn't help our use case, your code needs the entire pickle
> > stream to be in memory, which in our case would be about 475mb, this
> > is on top of the 300mb+ data structures that generated the pickle
> > stream.
> >
>
> In that case, the best we could do is a two-pass algorithm to remove
> the unused PUTs. That won't be efficient, but it will satisfy the
> memory constraint.
That is for what I'm doing for us right now.
> Another solution is to not generate the PUTs at all
> by setting the 'fast' attribute on Pickler. But that won't work if you
> have a recursive structure, or have code that requires that the
> identity of objects to be preserved.
We definitely have some cross links amongst the objects, so we need PUTs.
> By the way, it is weird that the total memory usage of the data
> structure is smaller than the size of its respective pickle stream.
> What pickle protocol are you using?
Its highest protocol, but we have a bunch of extension types that
get expanded into python tuples for pickling.
More information about the Python-Dev
mailing list