Pickle dict subclass instances using new protocol in PEP 307

Jimmy Retzlaff jimmy at retzlaff.com
Thu Oct 16 01:55:34 EDT 2003


I have a subclass of dict that acts kind of like Windows' file systems -
keys are case insensitive but case preserving (keys are assumed to be
strings, or at least they have to support .lower()). It's worked well
for quite a while - it used to inherit from UserDict and it has
inherited from dict since that became possible.

I just tried to pickle an instance of this class for the first time
using Python 2.3.2 on Windows. If I use protocols 0 (text) or 1 (binary)
everything works great. If I use protocol 2 (PEP 307) then I have a
problem when loading my pickle. Here is a small sample to illustrate:

######

import pickle

class myDict(dict):
    def __init__(self, *args, **kwargs):
        self.x = 1
        dict.__init__(self, *args, **kwargs)

    def __getstate__(self):
        print '__getstate__ returning', (self.copy(), self.x)
        return (self.copy(), self.x)

    def __setstate__(self, (d, x)):
        print '__setstate__'
        print '    object already in state:', self
        print '    x already in self:', 'x' in dir(self)
        self.x = x
        self.update(d)

    def __setitem__(self, key, value):
        print '__setitem__', (key, value)
        dict.__setitem__(self, key, value)


d = myDict()
d['key'] = 'value'

protocols = [(0, 'Text'), (1, 'Binary'), (2, 'PEP 307')]
for protocol, description in protocols:
    print '--------------------------------------'
    print 'Pickling with Protocol %s (%s)' % (protocol, description)
    pickle.dump(d, file('test.pickle', 'wb'), protocol)
    del d
    print 'Unpickling'
    d = pickle.load(file('test.pickle', 'rb'))

######

When run it prints:

__setitem__ ('key', 'value') - self.x exists: True
--------------------------------------
Pickling with Protocol 0 (Text)
__getstate__ returning ({'key': 'value'}, 1)
Unpickling
__setstate__
    object already in state: {'key': 'value'}
    x already in self: False
--------------------------------------
Pickling with Protocol 1 (Binary)
__getstate__ returning ({'key': 'value'}, 1)
Unpickling
__setstate__
    object already in state: {'key': 'value'}
    x already in self: False
--------------------------------------
Pickling with Protocol 2 (PEP 307)
__getstate__ returning ({'key': 'value'}, 1)
Unpickling
__setitem__ ('key', 'value') - self.x exists: False
__setstate__
    object already in state: {'key': 'value'}
    x already in self: False


The problem I'm having stems from the fact that the subclass'
__setitem__ is called before __setstate__ when loading a protocol 2
pickle (the subclass' __setitem__ is not called at all with protocols 0
or 1). If I don't define __get/setstate__ then I have the same problem
in that the subclass' __setitem__ is called before the subclass'
instance variables are created by the pickle mechanism. I need to access
one of those instance variables in my __setitem__.

I suppose my question is one of practicality. I'd like my class
instances to work with all pickle protocols. Am I getting too fancy
trying to inherit from dict? Should I go back to UserDict or maybe to
DictMixin? Should I submit a bug report on this, or am I getting too
close to internals to expect a certain behavior across pickle protocols?

Thanks,
Jimmy





More information about the Python-list mailing list