pickle.py and cPickle.c - persistent_id is always called - why?
matsaleh at my-deja.com
matsaleh at my-deja.com
Sat Apr 22 16:50:18 EDT 2000
Hello fellow .py types...
I am doing a bit of work using pickle/cpickle and
am trying to optimize. I found that my user-
defined persistent_id() function is being called
for (just about) every attribute (name and value)
in my object that is being pickled.
My specific (somewhat ad-hoc) test case pickles
~100 objects in a containment hierarchy, using
persistent_id to break the containment
relationship and replace the references with a
proprietary object id. Although I have only ~100
object references to resolve, persistent_id() is
called ~6400 times. In tracing the code, it
appears to be called for every attribute and
value in my objects.
This appears to be caused by the fact that
pickle.py:save() is called with the default
pers_save flag of 0 in all cases except for when
it is called by save_pers(). This appears to
indicate that all types: tuples, dicts,
sequences, longs, strings, etc, cause my
persistent_id() to be called, when all I want is
for it to be called for object instances.
I modified pickle.py to change the default of the
pers_save flag to 1 in the save() method, and
then call it with a 0 only from within the
save_inst() method. This amounts to invoking my
persistent_id() function only when a reference to
an object instance is encountered as my objects
are being pickled.
My pickled objects do not appear to be adversely
affected by this change, and the number of calls
to persistent_id() was reduced from ~6400 to
~500, reducing the time spent in this method by
an order of magnitude (0.37 sec to 0.03 sec).
I have not yet tested this change in cPickle.c,
but in my original tests, persistent_id() was a
much more significant factor in my runs using
cPickle, because all the pickling code is in C
and is relatively much faster than my
persistent_id() method, which is in Python. I
expect the performance boost my making the change
in cPickle to be even greater, relatively
My question is, why is persistent_id() being
invoked so often? I do not see the reason for
calling it for basic python types and structures
such as tuples, dicts, and the like. Is this a
reasonable change to make to the python source,
or is it not a safe change for the general
pickling cases that the pickle/cPickle modules
have to handle? I'm sure this code has been
scruitinized by many folks much more experienced
than I - I would be grateful for your insights.
Sent via Deja.com http://www.deja.com/
Before you buy.
More information about the Python-list