Pickling instances of nested classes
Currently instances of nested classes can't be pickled. For old style classes unpickling fails to find the class:
import cPickle class Foo: ... class Bar: ... pass ... cPickle.loads(cPickle.dumps(Foo.Bar())) Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute 'Bar'
For new style classes, pickling itself fails:
class Foo(object): ... class Bar(object): ... pass ... cPickle.loads(cPickle.dumps(Foo.Bar())) Traceback (most recent call last): File "<stdin>", line 1, in ? cPickle.PicklingError: Can't pickle
: attribute lookup __main__.Bar failed
I think this should be fixed (see below for use cases). There's an old bug report open for this (http://www.python.org/sf/633930). Classes would need access to their fully qualified name (starting from the module level) (perhaps in a new class attribute __fullname__?) and the pickle machinery would have to use this name when storing class names instead of __name__. The second part should be easy to implement. The first part is harder. It can't be done by changing meta classes, because classes are created "inside out", i.e. in: class Foo: class Bar: class Baz: pass when Baz class is created, the Bar class hasn't been assigned a name yet. Another problem is that it should only be done for classes *defined* inside other classes, i.e. in class Foo: pass class Bar: Baz = Foo Foo should remain unchanged. For this I guess the parser would have to be changed to somehow keep a stack of currently "open" class definitions, so the __fullname__ attribute (or dict entry) can be set once a new class statement is encountered. Of course the remaining interesting question is if this change is worth it and if there are use cases for it. Possible use cases require that: 1) There's a collection of name objects in the scope of a class. 2) These objects are subclasses of other (nested or global) classes. 3) There are instances of those classes. There are two use cases that immediately come to mind: XML and ORMs. For XML: 1) Those classes are the element types and the nested classes are the attributes. 2) Being able to define those attributes as separate classes makes it possible to implement custom functionality (e.g. for validation or for handling certain attribute types like URLs, colors etc.) and 3) Those classes get instantiated when an XML tree is created or parsed. A framework that does this (and my main motivation for writing this :)) is XIST (see http://www.livinglogic.de/Python/xist/). For the ORM case: Each top level class defines a table and the nested classes are the fields, i.e. something like this: class person(Table): class firstname(Varchar): "The person's first name" null = False class lastname(Varchar): "The person's last name" null = False class password(Varchar): "Login password" def validate(self, value): if not any(c.islower() for c in value) and \ not any(c.isupper() for c in value) and \ not any(c.isdigit() for c in value): raise InvalidField("password requires a mix of upper and lower" "case letters and digits") Instances of these classes are the records read from the database. A framework that does something similar to this (although AFAIK fields are not nested classes is SQLObject (http://sqlobject.org/) So is this change wanted? useful? implementable with reasonable effort? Or just not worth it? Bye, Walter Dörwald
Walter Dörwald wrote:
So is this change wanted? useful? implementable with reasonable effort? Or just not worth it?
I think it is just not worth it. This means I won't attempt to implement it. I think I defined originally the __module__ attribute for classes to support better pickling (and defined it to be a string to avoid cycles); we considered the nested classes case at the time and concluded "oh well, don't do that then". Regards, Martin
Martin v. Löwis wrote:
Walter Dörwald wrote:
So is this change wanted? useful? implementable with reasonable effort? Or just not worth it?
I think it is just not worth it. This means I won't attempt to implement it. I think I defined originally the __module__ attribute for classes to support better pickling (and defined it to be a string to avoid cycles); we considered the nested classes case at the time and concluded "oh well, don't do that then".
This sounds like this change would have a far greater chance of getting into Python if a patch existed (I guess, this is the case for all changes ;)). Bye, Walter Dörwald
Walter Dörwald wrote:
For XML: 1) Those classes are the element types and the nested classes are the attributes. 2) Being able to define those attributes as separate classes makes it possible to implement custom functionality (e.g. for validation or for handling certain attribute types like URLs, colors etc.) and 3) Those classes get instantiated when an XML tree is created or parsed. A framework that does this (and my main motivation for writing this :)) is XIST (see http://www.livinglogic.de/Python/xist/).
For the ORM case: Each top level class defines a table and the nested classes are the fields, i.e. something like this:
class person(Table): class firstname(Varchar): "The person's first name" null = False class lastname(Varchar): "The person's last name" null = False class password(Varchar): "Login password" def validate(self, value): if not any(c.islower() for c in value) and \ not any(c.isupper() for c in value) and \ not any(c.isdigit() for c in value): raise InvalidField("password requires a mix of upper and lower" "case letters and digits")
Instances of these classes are the records read from the database. A framework that does something similar to this (although AFAIK fields are not nested classes is SQLObject (http://sqlobject.org/)
So is this change wanted? useful? implementable with reasonable effort? Or just not worth it?
notice that in this cases often metaclasses are involved or could easely be, so if pickling would honor __reduce__ or __reduce_ex__ on metaclasses (which right now it doesn't treating their instances as normal classes) one could roll her own solution without the burden for the language of implementing pickling of nested classes in general, so I think that would make more sense, to add support to honor __reduce__/__reduce_ex__ for metaclasses.
Samuele Pedroni wrote:
Walter Dörwald wrote:
[User cases for pickling instances of nested classes] So is this change wanted? useful? implementable with reasonable effort? Or just not worth it?
notice that in this cases often metaclasses are involved or could easely be, so if pickling would honor __reduce__ or __reduce_ex__ on metaclasses (which right now it doesn't treating their instances as normal classes) one could roll her own solution without the burden for the language of implementing pickling of nested classes in general, so I think that would make more sense, to add support to honor __reduce__/__reduce_ex__ for metaclasses.
Sorry, I don't understand: In most cases it can be possible to work around the nested classes problem by implementing custom pickling functionality (getstate/setstate/reduce/reduce_ex). But it is probably impossible to implement this once and for all in a common base class, because there's no way to find the real name of the nested class (or any other handle that makes it possible to retrieve the class from the module on unpickling). And having the full name of the class available would certainly help in debugging. Bye, Walter Dörwald
Walter Dörwald wrote:
Samuele Pedroni wrote:
Walter Dörwald wrote:
[User cases for pickling instances of nested classes] So is this change wanted? useful? implementable with reasonable effort? Or just not worth it?
notice that in this cases often metaclasses are involved or could easely be, so if pickling would honor __reduce__ or __reduce_ex__ on metaclasses (which right now it doesn't treating their instances as normal classes) one could roll her own solution without the burden for the language of implementing pickling of nested classes in general, so I think that would make more sense, to add support to honor __reduce__/__reduce_ex__ for metaclasses.
Sorry, I don't understand: In most cases it can be possible to work around the nested classes problem by implementing custom pickling functionality (getstate/setstate/reduce/reduce_ex). But it is probably impossible to implement this once and for all in a common base class, because there's no way to find the real name of the nested class (or any other handle that makes it possible to retrieve the class from the module on unpickling).
And having the full name of the class available would certainly help in debugging.
that's probably the only plus point but the names would be confusing wrt
modules vs. classes.
My point was that enabling reduce hooks at the metaclass level has
propably other interesting applications, is far less complicated than
your proposal to implement, it does not further complicate the notion of
what happens at class creation time, and indeed avoids the
implementation costs (for all python impls) of your proposal and still
allows fairly generic solutions to the problem at hand because the
solution can be formulated at the metaclass level.
If pickle.py is patched along these lines [*] (strawman impl, not much
tested but test_pickle.py still passes, needs further work to support
__reduce_ex__ and cPickle would need similar changes) then this example
works:
class HierarchMeta(type):
"""metaclass such that inner classes know their outer class, with
pickling support"""
def __new__(cls, name, bases, dic):
sub = [x for x in dic.values() if isinstance(x,HierarchMeta)]
newtype = type.__new__(cls, name, bases, dic)
for x in sub:
x._outer_ = newtype
return newtype
def __reduce__(cls):
if hasattr(cls, '_outer_'):
return getattr, (cls._outer_, cls.__name__)
else:
return cls.__name__
# uses the HierarchMeta metaclass
class Elm:
__metaclass__ = HierarchMeta
def __init__(self, **stuff):
self.__dict__.update(stuff)
def __repr__(self):
return "<%s %s>" % (self.__class__.__name__, self.__dict__)
# example
class X(Elm):
class Y(Elm):
pass
class Z(Elm):
pass
import pickle
x = X(a=1)
y = X.Y(b=2)
z = X.Z(c=3)
xs = pickle.dumps(x)
ys = pickle.dumps(y)
zs = pickle.dumps(z)
print pickle.loads(xs)
print pickle.loads(ys)
print pickle.loads(zs)
pedronis$ python2.4 example.py
Samuele Pedroni wrote:
[...] And having the full name of the class available would certainly help in debugging.
that's probably the only plus point but the names would be confusing wrt modules vs. classes.
You'd propably need a different separator in repr. XIST does this:
from ll.xist.ns import html html.a.Attrs.href
My point was that enabling reduce hooks at the metaclass level has propably other interesting applications, is far less complicated than your proposal to implement, it does not further complicate the notion of what happens at class creation time, and indeed avoids the implementation costs (for all python impls) of your proposal and still allows fairly generic solutions to the problem at hand because the solution can be formulated at the metaclass level.
Pickling classes like objects (i.e. by using the pickling methods in their (meta-)classes) solves only the second part of the problem: Finding the nested classes in the module on unpickling. The other problem is to add additional info to the inner class, which gets pickled and makes it findable on unpickling.
If pickle.py is patched along these lines [*] (strawman impl, not much tested but test_pickle.py still passes, needs further work to support __reduce_ex__ and cPickle would need similar changes) then this example works:
class HierarchMeta(type): """metaclass such that inner classes know their outer class, with pickling support""" def __new__(cls, name, bases, dic): sub = [x for x in dic.values() if isinstance(x,HierarchMeta)]
I did something similar to this in XIST, but the problem with this approach is that in: class Foo(Elm): pass class Bar(Elm): Baz = Foo the class Foo will get its _outer_ set to Bar although it shouldn't.
[...] def __reduce__(cls): if hasattr(cls, '_outer_'): return getattr, (cls._outer_, cls.__name__) else: return cls.__name__
I like this approach: Instead of hardcoding how references to classes are pickled (pickle the __name__), deligate it to the metaclass. BTW, if classes and functions are pickable, why aren't modules:
import urllib, cPickle cPickle.dumps(urllib.URLopener) 'curllib\nURLopener\np1\n.' cPickle.dumps(urllib.splitport) 'curllib\nsplitport\np1\n.' cPickle.dumps(urllib) Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/local/lib/python2.4/copy_reg.py", line 69, in _reduce_ex raise TypeError, "can't pickle %s objects" % base.__name__ TypeError: can't pickle module objects
We'd just have to pickle the module name. Bye, Walter Dörwald
Walter Dörwald wrote:
Samuele Pedroni wrote:
[...] And having the full name of the class available would certainly help in debugging.
that's probably the only plus point but the names would be confusing wrt modules vs. classes.
You'd propably need a different separator in repr. XIST does this:
from ll.xist.ns import html html.a.Attrs.href
My point was that enabling reduce hooks at the metaclass level has propably other interesting applications, is far less complicated than your proposal to implement, it does not further complicate the notion of what happens at class creation time, and indeed avoids the implementation costs (for all python impls) of your proposal and still allows fairly generic solutions to the problem at hand because the solution can be formulated at the metaclass level.
Pickling classes like objects (i.e. by using the pickling methods in their (meta-)classes) solves only the second part of the problem: Finding the nested classes in the module on unpickling. The other problem is to add additional info to the inner class, which gets pickled and makes it findable on unpickling.
If pickle.py is patched along these lines [*] (strawman impl, not much tested but test_pickle.py still passes, needs further work to support __reduce_ex__ and cPickle would need similar changes) then this example works:
class HierarchMeta(type): """metaclass such that inner classes know their outer class, with pickling support""" def __new__(cls, name, bases, dic): sub = [x for x in dic.values() if isinstance(x,HierarchMeta)]
I did something similar to this in XIST, but the problem with this approach is that in:
class Foo(Elm): pass
class Bar(Elm): Baz = Foo
the class Foo will get its _outer_ set to Bar although it shouldn't.
this should approximate that behavior better: [not tested] import sys .... def __new__(cls, name, bases, dic): sub = [x for x in dic.values() if isinstance(x,HierarchMeta)] newtype = type.__new__(cls, name, bases, dic) for x in sub: if not hasattr(x, '_outer_') and getattr(sys.modules.get(x.__module__), x.__name__, None) is not x: x._outer_ = newtype return newtype ..... we don't set _outer_ if a way to pickle the class is already there
Samuele Pedroni wrote:
[...]
this should approximate that behavior better: [not tested]
import sys
.... def __new__(cls, name, bases, dic): sub = [x for x in dic.values() if isinstance(x,HierarchMeta)] newtype = type.__new__(cls, name, bases, dic) for x in sub: if not hasattr(x, '_outer_') and getattr(sys.modules.get(x.__module__), x.__name__, None) is not x: x._outer_ = newtype return newtype
.....
we don't set _outer_ if a way to pickle the class is already there
This doesn't fix class Foo: class Bar: pass class Baz: Bar = Foo.Bar both this should be a simple fix. Bye, Walter Dörwald
участники (3)
-
"Martin v. Löwis"
-
Samuele Pedroni
-
Walter Dörwald