Change in unpickle order in 2.2?
I have an application (Grouch) that has to do a lot of trickery at pickle-time and unpickle-time, and as a result it happens to be sensitive to the order of unpickling. (The reason for the pickle-time intervention is that Grouch stores type objects in its data structure, and you can't pickle type objects. So it hangs on to a representive value of the type for pickling -- eg. for the "integer" type, it keeps both IntType and 0 in memory, but only pickles 0, and uses type(0) to get IntType back at unpickle time.) The reason that Grouch is sensitive to the order of unpickling is because its data structure is a gnarly, incestuous knot of mutually interdependent classes, and I stopped tinkering with the pickle code as soon as I got something that worked with Python 2.0 and 2.1. Now it fails under 2.2. Under 2.1, it appears that certain more-deeply nested objects were unpickled first; under 2.2, that is no longer the case, and that screws up Grouch's test suite. Anyone got a vague, hand-waving explanation for my vague, hand-waving complaint? Or should I try to come up with a test case? Thanks -- Greg -- Greg Ward - software developer gward@mems-exchange.org MEMS Exchange http://www.mems-exchange.org
Greg Ward wrote:
I have an application (Grouch) that has to do a lot of trickery at pickle-time and unpickle-time, and as a result it happens to be sensitive to the order of unpickling.
What's Grouch ?
(The reason for the pickle-time intervention is that Grouch stores type objects in its data structure, and you can't pickle type objects. So it hangs on to a representive value of the type for pickling -- eg. for the "integer" type, it keeps both IntType and 0 in memory, but only pickles 0, and uses type(0) to get IntType back at unpickle time.)
Why don't you use a special reduce function which takes the tp_name as index into the types module ? Storing strings should avoid all complicated type object saving.
The reason that Grouch is sensitive to the order of unpickling is because its data structure is a gnarly, incestuous knot of mutually interdependent classes, and I stopped tinkering with the pickle code as soon as I got something that worked with Python 2.0 and 2.1. Now it fails under 2.2. Under 2.1, it appears that certain more-deeply nested objects were unpickled first; under 2.2, that is no longer the case, and that screws up Grouch's test suite.
Anyone got a vague, hand-waving explanation for my vague, hand-waving complaint? Or should I try to come up with a test case?
You should probably first check wether the pickle string is identical in 2.1 and 2.2 and then go on from there. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
On 10 January 2002, M.-A. Lemburg said:
What's Grouch ?
Grouch is a system for 1) describing a Python object schema, and 2) traversing an existing object graph (eg. a pickle or ZODB) to ensure that it conforms to that object schema. An object schema is a collection of classes (including the attributes in each class and the type of each attribute), atomic types, and type aliases. An atomic type is a type with no sub-types; by default every Grouch schema has five atomic types: int, string, long, complex, and float. You can easily add new atomic types, eg. the MEMS Exchange virtual fab has mxDateTime as an atomic type. A type alias is just what it sounds like, eg. "Foo" might be an alias for "foo.Foo" (a fully qualified class name representing a Grouch instance type), and "real" might be an alias for "int|long|float" (a Grouch union type). See http://www.mems-exchange.org/software/grouch/ Anyways, that's not terribly relevant, but it gives me an excuse to plug my most arcane and (IMHO) interesting Python hack. [me]
(The reason for the pickle-time intervention is that Grouch stores type objects in its data structure, and you can't pickle type objects. So it hangs on to a representive value of the type for pickling -- eg. for the "integer" type, it keeps both IntType and 0 in memory, but only pickles 0, and uses type(0) to get IntType back at unpickle time.)
[MAL]
Why don't you use a special reduce function which takes the tp_name as index into the types module ? Storing strings should avoid all complicated type object saving.
I'm not sure I understand what you're saying. Are you just suggesting that, when I need to pickle IntType, I pickle the string "int" instead of the integer 0? I don't see how that makes any difference: I still need to intercede at pickle/unpickle time to make this happen. Also, the fact that type(x).__name__ is not consistent across Python versions or implementations (Jython) screws this up. Grouch now has its own canonical set of type names because of this, and I could easily reverse that dictionary to make a typename->typeobject mapping. But I don't see how pickling "int" is a win over pickling 0, when what I *really* want to do is pickle IntType.
You should probably first check wether the pickle string is identical in 2.1 and 2.2 and then go on from there.
Excellent idea -- thanks! Greg -- Greg Ward - nerd gward@python.net http://starship.python.net/~gward/ "Eine volk, eine reich, eine f�hrer" --Hitler "One world, one web, one program" --Microsoft
Greg Ward wrote:
On 10 January 2002, M.-A. Lemburg said:
What's Grouch ?
[Grouch is a system for 1) describing a Python object schema, and 2) traversing an existing object graph (eg. a pickle or ZODB) to ensure that it conforms to that object schema.]
Sounds very interesting :-)
[me]
(The reason for the pickle-time intervention is that Grouch stores type objects in its data structure, and you can't pickle type objects. So it hangs on to a representive value of the type for pickling -- eg. for the "integer" type, it keeps both IntType and 0 in memory, but only pickles 0, and uses type(0) to get IntType back at unpickle time.)
[MAL]
Why don't you use a special reduce function which takes the tp_name as index into the types module ? Storing strings should avoid all complicated type object saving.
I'm not sure I understand what you're saying. Are you just suggesting that, when I need to pickle IntType, I pickle the string "int" instead of the integer 0?
Right. It needn't be 'int', any string will do as long as you have a mapping from strings to type objects.
I don't see how that makes any difference: I still need to intercede at pickle/unpickle time to make this happen.
Well, I suppose with the new Python 2.2 version you could add a special __reduce__ method to type objects which takes of this for you. For older versions, you should probably register a pickle handler for type objects which does the same. Pickle should then use this handler for pickling the type object.
Also, the fact that type(x).__name__ is not consistent across Python versions or implementations (Jython) screws this up. Grouch now has its own canonical set of type names because of this, and I could easily reverse that dictionary to make a typename->typeobject mapping. But I don't see how pickling "int" is a win over pickling 0, when what I *really* want to do is pickle IntType.
True, but it saves you the trouble of storing global references to the type constructors in the pickle. Your system will do the mapping using the above hooks. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
I have an application (Grouch) that has to do a lot of trickery at pickle-time and unpickle-time, and as a result it happens to be sensitive to the order of unpickling.
(The reason for the pickle-time intervention is that Grouch stores type objects in its data structure, and you can't pickle type objects. So it hangs on to a representive value of the type for pickling -- eg. for the "integer" type, it keeps both IntType and 0 in memory, but only pickles 0, and uses type(0) to get IntType back at unpickle time.)
The reason that Grouch is sensitive to the order of unpickling is because its data structure is a gnarly, incestuous knot of mutually interdependent classes, and I stopped tinkering with the pickle code as soon as I got something that worked with Python 2.0 and 2.1. Now it fails under 2.2. Under 2.1, it appears that certain more-deeply nested objects were unpickled first; under 2.2, that is no longer the case, and that screws up Grouch's test suite.
Anyone got a vague, hand-waving explanation for my vague, hand-waving complaint? Or should I try to come up with a test case?
Yes please, and post it to SourceForge. There aren't that many changes in the source of pickle.py since release 2.1. (Or are you using cPickle? If so, please say so. The two aren't 100% equivalent.) I see changes related to unicode, and type objects are pickled differently in 2.2. There's also a change that refuses to pickle an "global" (a reference by module and object name, used for classes, types and functions) when the name that the object claims to have doesn't refer to the same object. There's a new test on __safe_for_unpickling__. Hm, I think you must be using cPickle, I don't know enough about it to help. --Guido van Rossum (home page: http://www.python.org/~guido/)
Yes please, and post it to SourceForge. There aren't that many changes in the source of pickle.py since release 2.1.
I think there have been changes to the order in which things come out of a dictionary, which could affect pickling order. Unpickling order, of course, should strictly follow the order in which things are in the file. Regards, Martin
[me]
I have an application (Grouch) that has to do a lot of trickery at pickle-time and unpickle-time, and as a result it happens to be sensitive to the order of unpickling. [...] Anyone got a vague, hand-waving explanation for my vague, hand-waving complaint? Or should I try to come up with a test case?
[Guido]
Yes please, and post it to SourceForge. There aren't that many changes in the source of pickle.py since release 2.1. (Or are you using cPickle? If so, please say so. The two aren't 100% equivalent.)
Tried it with both pickle and cPickle, with the same result (ie. one of my test cases failed with the exact same traceback, apparently for the same reason). I'll see if I can't reduce this to something that doesn't rely on 1500 hairy lines of Grouch code. (Only fitting that something named for Oscar the Grouch is hairy, eh?) Greg -- Greg Ward - Linux weenie gward@python.net http://starship.python.net/~gward/ A man without religion is like a fish without a bicycle.
[me]
I have an application (Grouch) that has to do a lot of trickery at pickle-time and unpickle-time, and as a result it happens to be sensitive to the order of unpickling. [...] Anyone got a vague, hand-waving explanation for my vague, hand-waving complaint? Or should I try to come up with a test case?
[Guido]
Yes please, and post it to SourceForge. There aren't that many changes in the source of pickle.py since release 2.1. (Or are you using cPickle? If so, please say so. The two aren't 100% equivalent.)
False alarm. It appears that a change in dictionary order bit me; I was lucky that pickling Grouch objects ever worked at all. Lesson: when the code to support pickling is too complex too understand, it's too complex. Hmmm, that might have broader application. ;-) Greg -- Greg Ward - Linux geek gward@python.net http://starship.python.net/~gward/ Time flies like an arrow; fruit flies like a banana.
[Greg Ward]
False alarm. It appears that a change in dictionary order bit me; I was lucky that pickling Grouch objects ever worked at all.
You were luckier we changed dict iteration order for your own good <wink>.
Lesson: when the code to support pickling is too complex too understand, it's too complex. Hmmm, that might have broader application. ;-)
No, I'm sure Zope Corporation would officially deny, denounce and decry any intimation that convolution in support of pickling is a vice. The true problem is more likely that you haven't yet added enough layers of abstraction around your pickling code. I'm especially suspicious of that because you were able to figure out the cause of the problem in less than a week ...
participants (6)
-
Greg Ward
-
Greg Ward
-
Guido van Rossum
-
M.-A. Lemburg
-
Martin v. Loewis
-
Tim Peters