[Python-3000] new pickle semantics/API

tomer filiba tomerfiliba at gmail.com
Wed Jan 24 21:45:12 CET 2007


i'm having great trouble in RPyC with pickling object proxies.
several users have asked for this feature, but no matter how hard
i try to "bend the truth", pickle always complains. it uses
type(obj) for the dispatching, which "uncovers" the object
is actually a proxy, rather than a real object.

recap: RPyC uses local proxies that refer to objects of a
remote interpreter (another process/machine).

if you'd noticed, every RPC framework has its own serializer.
for example banna/jelly in twisted and bunch of other
XML serializers, and what not.

for RPyC i wrote yet another serializer, but for different purposes,
so it's not relevant for the issue at hand.

what i want is a standard serialization *API*. the idea is that
any framework could make use of this API, and that it would
be generic enough to eliminate copy_reg and other misfortunes.
this also means the built in types should be familiarized with
this API.

- - - - - - - -

for example, currently the builtin types don't support __reduce__,
and require pickle to use it's own internal registry. moreover,
__reduce__ is very pickle-specific (i.e., it takes the protocol number).
what i'm after is an API for "simplifying" complex objects into
simpler parts.

here's the API i'm suggesting:

def __getstate__(self):
    # return a tuple of (type(self), obj), where obj is a simplified
    # version of self

@classmethod
def __setstate__(cls, state):
    # return an instance of cls, with the given state

well, you may already know these two, although their
semantics are different. but wait, there's more!

the idea is of having the following simple building blocks:
* integers (int/long)
* strings (str)
* arrays (tuples)

all picklable objects should be able to express themselves
as a collection of these building blocks. of course this will be
recursive, i.e., object X could simplify itself as object Y,
where object Y might go further simplification, until we are
left with building blocks only.

for example:
* int - return self
* float - string in the format "[+-]X.YYYe[+-]EEE"
* complex - two floats
* tuple - tuple of its simplified elements
* list - tuple of its simplified elements
* dict - a tuple of (key, value) tuples
* set - a tuple of its items
* file - raises TypeError("can't be simplified")

all in all, i choose to call that *simplification* rather than *serialization*,
as serialization is more about converting the simplified objects into a
sequence of bytes. my suggestion leaves that out for the
implementers of specific serializers.

so this is how a typical serializer (e.g., pickle) would be implemented:
* define its version of a "recursive simplifier"
* optionally use a "memo" to remember objects that were already
visited (so they would be serialized by reference rather than by value)
* define its flavor of converting ints, strings, and arrays to bytes
(binary, textual, etc. etc.)

- - - - - - - -

the default implementation of __getstate__, in object.__getstate__,
will simply return self.__dict__ and any self.__slots__

this removes the need for __reduce__, __reduce_ex__, and copy_reg,
and simplifies pickle greatly. it does require, however, adding
support for simplification for all builtin types... but this doesn't
call for much code:
    def PyList_GetState(self):
        state = tuple(PyObject_GetState(item) for item in self)
        return PyListType, state

also note that it makes the copy module much simpler:
    def copy(obj):
        state = obj.__getstate__()
        return type(obj).__setstate__(state)

- - - - - - - -

executive summary:
simplifying object serialization and copying by revising
__getstate__ and __setstate__, so that they return a
"simplified" version of the object.

this new mechanism should become an official API to
getting or setting the "contents" of objects (either builtin or
user-defined).

having this unified mechanism, pickling proxy objects would
work as expected.

if there's interest, i'll write a pep-like document to explain
all the semantics.



-tomer


More information about the Python-3000 mailing list