---------- Forwarded message ---------- From: tomer filiba firstname.lastname@example.org Date: Jan 24, 2007 10:45 PM Subject: new pickle semantics/API To: Pythonemail@example.com
i'm having great trouble in RPyC with pickling object proxies. several users have asked for this feature, but no matter how hard i try to "bend the truth", pickle always complains. it uses type(obj) for the dispatching, which "uncovers" the object is actually a proxy, rather than a real object.
recap: RPyC uses local proxies that refer to objects of a remote interpreter (another process/machine).
if you'd noticed, every RPC framework has its own serializer. for example banna/jelly in twisted and bunch of other XML serializers, and what not.
for RPyC i wrote yet another serializer, but for different purposes, so it's not relevant for the issue at hand.
what i want is a standard serialization *API*. the idea is that any framework could make use of this API, and that it would be generic enough to eliminate copy_reg and other misfortunes. this also means the built in types should be familiarized with this API.
- - - - - - - -
for example, currently the builtin types don't support __reduce__, and require pickle to use it's own internal registry. moreover, __reduce__ is very pickle-specific (i.e., it takes the protocol number). what i'm after is an API for "simplifying" complex objects into simpler parts.
here's the API i'm suggesting:
def __getstate__(self): # return a tuple of (type(self), obj), where obj is a simplified # version of self
@classmethod def __setstate__(cls, state): # return an instance of cls, with the given state
well, you may already know these two, although their semantics are different. but wait, there's more!
the idea is of having the following simple building blocks: * integers (int/long) * strings (str) * arrays (tuples)
all picklable objects should be able to express themselves as a collection of these building blocks. of course this will be recursive, i.e., object X could simplify itself as object Y, where object Y might go further simplification, until we are left with building blocks only.
for example: * int - return self * float - string in the format "[+-]X.YYYe[+-]EEE" * complex - two floats * tuple - tuple of its simplified elements * list - tuple of its simplified elements * dict - a tuple of (key, value) tuples * set - a tuple of its items * file - raises TypeError("can't be simplified")
all in all, i choose to call that *simplification* rather than *serialization*, as serialization is more about converting the simplified objects into a sequence of bytes. my suggestion leaves that out for the implementers of specific serializers.
so this is how a typical serializer (e.g., pickle) would be implemented: * define its version of a "recursive simplifier" * optionally use a "memo" to remember objects that were already visited (so they would be serialized by reference rather than by value) * define its flavor of converting ints, strings, and arrays to bytes (binary, textual, etc. etc.)
- - - - - - - -
the default implementation of __getstate__, in object.__getstate__, will simply return self.__dict__ and any self.__slots__
this removes the need for __reduce__, __reduce_ex__, and copy_reg, and simplifies pickle greatly. it does require, however, adding support for simplification for all builtin types... but this doesn't call for much code: def PyList_GetState(self): state = tuple(PyObject_GetState(item) for item in self) return PyListType, state
also note that it makes the copy module much simpler: def copy(obj): state = obj.__getstate__() return type(obj).__setstate__(state)
- - - - - - - -
executive summary: simplifying object serialization and copying by revising __getstate__ and __setstate__, so that they return a "simplified" version of the object.
this new mechanism should become an official API to getting or setting the "contents" of objects (either builtin or user-defined).
having this unified mechanism, pickling proxy objects would work as expected.
if there's interest, i'll write a pep-like document to explain all the semantics.