Pickling limitation with instances defining __cmp__/__hash__?
Erik Max Francis
max at alcyone.com
Mon Jun 27 21:13:46 EDT 2005
I've come across a limitation in unpickling certain types of complex
data structures which involve instances that override __hash__, and was
wondering if it was known (basic searches didn't seem to come up with
anything similar) and if there is a workaround for it short of
restructuring the data structures in question.
The fundamental issue rests with defining classes which override __cmp__
and __hash__ in order to be used as keys in dictionaries (and elements
of sets). __cmp__ and __hash__ are defined to manipulate a single
attribute of the class, which never changes for the lifetime of an
object. In a simplified form:
class C:
def __init__(self, x):
self.x = x
def __cmp__(self, other):
return cmp(self.x, other.x)
def __hash__(self):
return hash(self.x)
Even if C contains other members which are manipulated, making it
technically mutable, since the one attribute (in this example, x) which
is used for __cmp__ and __hash__ is never changed after the creation of
the object, it is legal to use as a dictionary key. (Formally, the
atrribute in question is a name which is guaranteed to be unique.)
The difficulty arises when the data structures that are built up in C
contain a circular reference to itself as a dictionary key. In my
particular case the situation is rather involved, but the simplest
example which reproduces the problem (using C) would be:
c = C(1)
c.m = {c: '1'}
So far this is fine and behaves as expected. Pickling the object c
results in no problems. Unpickling it, however, results in an error:
data = pickle.dumps(c)
d = pickle.loads(data) # line 25
Traceback (most recent call last):
File "/home/max/tmp/hash.py", line 25, in ?
d = pickle.loads(data)
File "/usr/local/lib/python2.4/pickle.py", line 1394, in loads
return Unpickler(file).load()
File "/usr/local/lib/python2.4/pickle.py", line 872, in load
dispatch[key](self)
File "/usr/local/lib/python2.4/pickle.py", line 1218, in load_setitem
dict[key] = value
File "/home/max/tmp/hash.py", line 15, in __hash__
return hash(self.x)
AttributeError: C instance has no attribute 'x'
By poking around, one can see that the error is occurring because the
unpickler algorithm is trying to use the instance as a key in a
dictionary before the instance has been completely initialized (in fact,
the __dict__ of this object is the empty dictionary!).
The error happens regardless of whether pickle or cPickle is used (so I
used pickle to give a more meaningful traceback above), nor whether the
protocol is 0 or HIGHEST_PROTOCOL.
Is this issue known? I don't see any mention of this kind of
circularity in the Python Library Reference 3.14.4. Second, is there
any reasonably straightforward workaround to this limitation, short of
reworking things so that these self-referenced objects aren't used as
dictionary keys?
--
Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
You'll learn / Life is worth it / Watch the tables turn
-- TLC
More information about the Python-list
mailing list