Pickle breakage with reduce of recursive structures (was: AssertionError in pickle's memoize function)

[Followups to python-dev, please.] [Michael Hohn]
[Tim Peters]
Assertions are never expected to fail, so "something impossible happened" when they do fail.
[Michael Hohn]
Here is a code sample that shows the problem I ran into:
Summary for the OP: This is a bug in Python. Using cPickle won't help, but if you don't subclass builtin container types others than list, dict, and tuple, using pickle protocol 2 should work. The rest of this message is for python-dev. The simplest breaking case is: t = type('t', (list,), {}) obj = t() obj.append(obj) pickle.dumps(obj) [infinite recursion] The subclass causes save_reduce to be used instead of save_list. For proto < 2, copy_reg._reduce_ex returns (_reconstructor, list(a)), and the args--list(a)--cycle back through obj. Initially it looks like this should be okay, but the args are saved before obj is memoized, and obj can't be memoized until REDUCE can be executed with the args--and there's the cycle. It is even more obviously impossible from the unpickler's perspective because it has to call _reconstructor([obj]) to create obj! There are two separate problems: 1. Any __reduce__ implementation that returns args that cycle back through the object it tried to reduce hasn't done its job. As described above, _reduce_ex is one such implementation. reduce_2 avoids this by using the listitems and dictitems parameters. Since that's a pickler-side feature it can be used in _reduce_ex too. The basetype(obj) hook (documented in PEP 307) would remain for immutable bases; it doesn't work for containers, but user-defined containers already have to implement their own reduce functions. POC patch: http://www.trit.org/~dima/home/reduce_ex.diff At least the set and deque types also have this problem. 2. The pickle implementations don't detect reduction cycles. Pickling an instance of this obviously broken class causes an infinite recursion: class evil(object): def __reduce__(self): return evil, (self,) It's easy to detect this case. POC patch for the pickle module: http://www.trit.org/~dima/home/redcycle.diff BTW, the failed assert the OP is seeing happens when the cycle goes through another object: t = type('t', (list,), {}) obj = t() d = {'obj': obj} obj.append(d) pickle.dumps(obj) [AssertionError] cPickle has the same problem, but it lacks the assert, so it writes garbage instead: new = cPickle.loads(cPickle.dumps(obj)) new[0]['obj'] is new -> False # wrong obj[0]['obj'] is obj -> True # right This makes the reduction cycle check (#2 above) more than just cosmetic since if cPickle had that assert (it should) it would've been a crash. Right now it's garbage output instead, which is arguably worse. Formally complete versions of the above patches will be on SF tomorrow unless someone suggests better alternatives. Dima.

On Sun, Oct 31, 2004, Dima Dorfman wrote:
Very nice! Please file a bug report on SF. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ WiFi is the SCSI of the 21st Century -- there are fundamental technical reasons for sacrificing a goat. (with no apologies to John Woods)

On Sun, Oct 31, 2004, Dima Dorfman wrote:
Very nice! Please file a bug report on SF. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ WiFi is the SCSI of the 21st Century -- there are fundamental technical reasons for sacrificing a goat. (with no apologies to John Woods)
participants (2)
-
Aahz
-
Dima Dorfman