pickle.py and cPickle.c - persistent_id() is always called - why?

matsaleh at my-deja.com matsaleh at my-deja.com
Sat Apr 22 16:52:55 EDT 2000


Hello fellow .py types...

I am doing a bit of work using pickle/cpickle and am trying to
optimize. I found that my user-defined persistent_id() function is
being called for (just about) every attribute (name and value) in my
object that is being pickled.

My specific (somewhat ad-hoc) test case pickles ~100 objects in a
containment hierarchy, using persistent_id to break the containment
relationship and replace the references with a proprietary object id.
Although I have only ~100 object references to resolve, persistent_id()
is called ~6400 times. In tracing the code, it appears to be called for
every attribute and value in my objects.

This appears to be caused by the fact that pickle.py:save() is called
with the default pers_save flag of 0 in all cases except for when it is
called by save_pers(). This appears to indicate that all types: tuples,
dicts, sequences, longs, strings, etc, cause my persistent_id() to be
called, when all I want is for it to be called for object instances.

I modified pickle.py to change the default of the pers_save flag to 1
in the save() method, and then call it with a 0 only from within the
save_inst() method. This amounts to invoking my persistent_id()
function only when a reference to an object instance is encountered as
my objects are being pickled.

My pickled objects do not appear to be adversely  affected by this
change, and the number of calls to persistent_id() was reduced from
~6400 to ~500, reducing the time spent in this method by an order of
magnitude (0.37 sec to 0.03 sec).

I have not yet tested this change in cPickle.c, but in my original
tests, persistent_id() was a much more significant factor in my runs
using cPickle, because all the pickling code is in C and is relatively
much faster than my persistent_id() method, which is in Python. I
expect the performance boost my making the change in cPickle to be even
greater, relatively speaking.

My question is, why is persistent_id() being invoked so often? I do not
see the reason for calling it for basic python types and structures
such as tuples, dicts, and the like. Is this a reasonable change to
make to the python source, or is it not a safe change for the general
pickling cases that the pickle/cPickle modules have to handle? I'm sure
this code has been scruitinized by many folks much more experienced
than I - I would be grateful for your insights.

Regards.



Sent via Deja.com http://www.deja.com/
Before you buy.



More information about the Python-list mailing list