pickling a subclass of tuple
Alex Martelli
aleaxit at yahoo.com
Sat Jan 1 11:28:55 EST 2005
fedor <nobody at here.com> wrote:
> Hi all, happy new year,
>
> I was trying to pickle a instance of a subclass of a tuple when I ran
> into a problem. Pickling doesn't work with HIGHEST_PROTOCOL. How should
> I rewrite my class so I can pickle it?
You're falling afoul of an optimization in pickle's protocol 2, which is
documented in pickle.py as follows:
# A __reduce__ implementation can direct protocol 2 to
# use the more efficient NEWOBJ opcode, while still
# allowing protocol 0 and 1 to work normally. For this to
# work, the function returned by __reduce__ should be
# called __newobj__, and its first argument should be a
# new-style class. The implementation for __newobj__
# should be as follows, although pickle has no way to
# verify this:
#
# def __newobj__(cls, *args):
# return cls.__new__(cls, *args)
#
# Protocols 0 and 1 will pickle a reference to __newobj__,
# while protocol 2 (and above) will pickle a reference to
# cls, the remaining args tuple, and the NEWOBJ code,
# which calls cls.__new__(cls, *args) at unpickling time
# (see load_newobj below). If __reduce__ returns a
# three-tuple, the state from the third tuple item will be
# pickled regardless of the protocol, calling __setstate__
# at unpickling time (see load_build below).
Essentially, and simplifying just a little...: you're inheriting
__reduce_ex__ (because you're not overriding it), but you ARE overriding
__new__ *and changing its signature* -- so, the inherited __reduce__ex__
is used, and, with this protocol 2 optimization, it essentially assumes
that __new__ is similarly used -- or, at least, that a __new__ is used
which does not arbitrarily change the signature!
So, if you want to change __new__'s signature, and yet be picklable by
protocol 2, you have to override __reduce_ex__ to return the right
"args"... those your class's __new__ expects!
For example, you could consider something like...:
def __newobj__(cls, *args):
return cls.__new__(cls, *args)
class A(tuple):
def __new__(klass, arg1, arg2):
return super(A, klass).__new__(klass, (arg1, arg2))
def __reduce_ex__(self, proto=0):
if proto >= 2:
return __newobj__, (A, self[0], self[1])
else:
return super(A, self).__reduce_ex__(proto)
Note the key difference in A's __reduce_ex__ (for proto=2) wrt tuple's
(which is the same as object's) -- that's after an "import a" where a.py
has this code as well as an 'a = A(1, 2)'...:
>>> a.a.__reduce_ex__(2)
(<function __newobj__ at 0x3827f0>, (<class 'a.A'>, 1, 2))
>>> tuple.__reduce_ex__(a.a, 2)
(<function __newobj__ at 0x376770>, (<class 'a.A'>, (1, 2)), {}, None,
None)
>>>
Apart from the additional tuple items (not relevant here), tuple's
reduce returns args as (<class 'a.A'>, (1, 2)) -- two items: the class
and the tuplevalue; so with protocol 2 this ends up calling A.__new__(A,
(1,2))... BOOM, because, differently from tuple.__new__, YOUR override
doesn't accept this signature! So, I suggest tweaking A's reduce so it
returns args as (<class 'a.A'>, 1, 2)... apparently the only signature
you're willing to accept in your A.__new__ method.
Of course, if A.__new__ can have some flexibility, you COULD have it
accept the same signature as tuple.__new__ and then you wouldn't have to
override __reduce_ex__. Or, you could override __reduce_ex__ in other
ways, say:
def __reduce_ex__(self, proto=0):
if proto >= 2:
proto = 1
return super(A, self).__reduce_ex__(proto)
this would avoid the specific optimization that's tripping you up due to
your signature-change in __new__.
The best solution may be to forget __reduce_ex__ and take advantage of
the underdocumented special method __getnewargs__ ...:
class A(tuple):
def __new__(klass, arg1, arg2):
return super(A, klass).__new__(klass, (arg1, arg2))
def __getnewargs__(self):
return self[0], self[1]
This way, you're essentially choosing to explicitly tell the "normal"
__reduce_ex__ about the particular arguments you want to be used for the
__new__ call needed to reconstruct your object on unpickling! This
highlights even better the crucial difference, due strictly to the
change in __new__'s signature...:
>>> a.a.__getnewargs__()
(1, 2)
>>> tuple.__getnewargs__(a.a)
((1, 2),)
It IS, I guess, somewhat unfortunate that you have to understand
pickling in some depth to let you change __new__'s signature and yet
fully support pickling... on the other hand, when you're overriding
__new__ you ARE messing with some rather deep infrastructure,
particularly if you alter its signature so that it doesn't accept
"normal" calls any more, so it's not _absurd_ that compensatory depth of
understanding is required;-).
Alex
More information about the Python-list
mailing list