[Python-Dev] pickling subclasses of types (Re: [Python-checkins] CVS: python/dist/src/Lib xmlrpclib.py,1.11,1.12)

Guido van Rossum guido@python.org
Thu, 11 Oct 2001 05:02:05 -0400


> > Incidentally, it looks like the XML-RPC code won't work with
> > subclasses of built-in types.  It's does dispatch on the type() of the
> > object, but a subclass of string won't have type StringType.  It seems to
> > me, though, that it should be marshallable using XML-RPC.
> 
> You have a good point there: AFAIK, pickle and marshal both
> do the same thing. Perhaps at least pickle should be adapted 
> to treat all objects like instances or at least treat subclasses 
> as instances so they make it across to the unpickling end (Guido,
> perhaps you already have code in place which does this ?!).

As of 2.2a4, pickle fully supports subclasses of built-in types.  This
is done without any changes to pickle or cPickle: the 'object' class
has a __reduce__ method that "does the right thing" (it is implemented
by calling copy_reg._reduce).  The only unsupported feature is
__slots__: attributes described by __slots__ aren't saved (this could
be fixed by adding more code to copy_reg._reduce).

Marshal happens to use a series of PyXxx_Check(obj) tests rather than
a switch on obj->ob_type, so it already supports subclasses of
built-in types it supports.

Note that marshal and pickle make different choices about what to do
for instances of subclasses of built-in types: marshal treats these as
the corresponding base type, while pickle attempts to save *all* the
object's state.

I'm not sure what choice XMLRPC should make: it attempts to marshal
the state of classic instances as as structures, but these are
unmarshalled as dicts, not as instances.  It should probably be
extended to do the same for instances of subclasses of 'object', but
I'm not sure I'm comfortable with it losing some of the object's state
in the case of subclasses of built-in types: some state would be lost
in either case.  For example:

  class C(str):
    def __init__(self):
      self.encoding = "ascii"
    def set_encoding(self, enc):
      self.encoding = enc

  a = C("hello world")
  a.set_encoding("latin-1")

If we choose to marshal 'a' as a string, the encoding is lost; if we
choose to marshal it as a record, the data is lost!

This actually answers the question for me: *if* a new-style class is a
subclass of a built-in type that xmlrpclib supports, its instances
should be marshalled as the corresponding base type; otherwise it
should be marshalled as a struct.

--Guido van Rossum (home page: http://www.python.org/~guido/)