pickle.reduce and deconstruct functions

This was [posted on -ideas][1], but apparently many people didn't see it because of the GMane migration going on at exactly the same time. At any rate, Antoine Pitrou suggested it should be discussed on -dev instead. And this gives me a chance to edit it (apparently it was markdown-is enough to confuse Hyperkitty and turn into a mess). Pickling uses an extensible protocol that lets any class determine how its instances can be deconstructed and reconstructed. Both `pickle` and `copy` use this protocol, but it could be useful more generally. Unfortunately, to use it more generally requires relying on undocumented details. I think we should expose a couple of helpers to fix that: # Return the same (shallow) reduction tuple that pickle.py, copy.py, and _pickle.c would use pickle.reduce(obj) -> (callable, args[, state[, litems[, ditem[, statefunc]]]]) # Return a callable and arguments to construct a (shallow) equivalent object # Raise a TypeError when that isn't possible pickle.deconstruct(obj) -> callable, args, kw So, why do you want these? There are many cases where you want to "deconstruct" an object if possible. For example: * Pattern matching depends on being able to deconstruct objects like this * Auto-generating a `__repr__` as suggested in [Chris Angelico's -ideas thread][2]. * Quick&dirty REPL stuff, and deeper reflection stuff using `inspect.signature` and friends. Of course not every type tells `pickle` what to do in an appropriate way that we can use, but a pretty broad range of types do, including (I think; I haven't double-checked all of them) `@dataclass`, `namedtuple`, `@attr.s`, many builtin and extension types, almost all reasonable types that use `copyreg`, and any class that pickles via the simplest customization hook `__getnewargs[_ex]__`. That's more than enough to be useful. And, just as important, it won't (except in intentionally pathological cases) give us a false positive, where a type is correctly pickleable and we think we can deconstruct it but the deconstruction is wrong. (For some uses, you are going to want to fall back to heuristics that are often right but sometimes misleadingly wrong, but I don't think the `pickle` module should offer anything like that. Maybe `inspect` should, but I'm not proposing that here.) The way to get the necessary information isn't fully documented, and neither is the way to interpret it. And I don't think it _should_ be documented, because it changes every so often, and for good reasons; we don't want anyone writing third-party code that relies on those details. Plus, a different Python implementation might conceivably do it differently. Public helpers exposed from `pickle` itself won't have those problems. Here's a first take at the code. def reduce(obj, proto=pickle.DEFAULT_PROTOCOL): """reduce(obj) -> (callable, args[, state[, litems[, ditem[, statefunc]]]]) Return the same reduction tuple that the pickle and copy modules use """ cls = type(obj) if reductor := copyreg.dispatch_table.get(cls): return reductor(obj) # Note that this is not a special method call (not looked up on the type) if reductor := getattr(obj, "__reduce_ex__"): return reductor(proto) if reductor := getattr(obj, "__reduce__"): return reductor() raise TypeError(f"{cls.__name__} objects are not reducible") def deconstruct(obj): """deconstruct(obj) -> callable, args, kwargs callable(_args, **kwargs) will construct an equivalent object """ reduction = reduce(obj) # If any of the optional members are included, pickle/copy has to # modify the object after construction, so there is no useful single # call we can deconstruct to. if any(reduction[2:]): raise TypeError(f"{type(obj).__name__} objects are not deconstrutable") func, args,_ _ = reduction # Many types (including @dataclass, namedtuple, and many builtins) # use copyreg.__newobj__ as the constructor func. The args tuple is # the type (or, when appropriate, some other registered # constructor) followed by the actual args. However, any function # with the same name will be treated the same way (because under the # covers, this is optimized to a special opcode). if func.__name__ == "__newobj__": return args[0], args[1:], {} # (Mainly only) used by types that implement __getnewargs_ex__ use # copyreg.__newobj_ex__ as the constructor func. The args tuple # holds the type, *args tuple, and **kwargs dict. Again, this is # special-cased by name. if func.__name__ == "__newobj_ex__": return args # If any other special copyreg functions are added in the future, # this code won't know how to handle them, so bail. if func.__module__ == 'copyreg': raise TypeError(f"{type(obj).__name__} objects are not deconstrutable") # Otherwise, the type implements a custom __reduce__ or __reduce_ex__, # and whatever it specifies as the constructor is the real constructor. return func, args, {} Actually looking at the logic as code, I think it makes a better argument for why we don't want to make all the internal details public. :) Here are some quick (completely untested) examples of other things we could build on it. # in inspect.py def deconstruct(obj): """deconstruct(obj) -> callable, bound_args Calling the callable on the bound_args would construct an equivalent object """ func, args, kw = pickle.deconstruct(obj) sig = inspect.signature(func) return func, sig.bind(_args, **kw) # in reprlib.py, for your __repr__ to delegate to def auto_repr(obj): func, bound_args = inspect.deconstruct(obj) args = itertools.chain( map(repr, bound_args.args), (f"{key!r}={value!r}" for key, value in bound_args.kwargs.items())) return f"{func.__name__}({', '.join(args)})" # or maybe as a class decorator def auto_repr(cls): def __repr__(self): func, bound_args = inspect.deconstruct(self) args = itertools.chain( map(repr, bound_args.args), (f"{key!r}={value!r}" for key, value in bound_args.kwargs.items())) return f"{func.__name__}({', '.join(args)})" cls.__repr__ = __repr__ return cls [1]: https://mail.python.org/archives/list/python-ideas@python.org/thread/RTZGM7L... [2]: https://mail.python.org/archives/list/python-ideas@python.org/thread/37UNEDR...

Some additional things that might be worth doing. I believe this exposes enough to allow people to build an object graph walker out of the `pickle`/`copy` protocol without having to access fragile internals, and without restricting future evolution of the internals of the protocol. See [the same -ideas thread][1] again for details on how it could be used and why. These would all be changes to the `copy` module, together with the changes to `pickle` and `copyreg` in the previous message: class Memo(dict): """Memo is a mapping that can be used to do memoization exactly the same way deepcopy does, so long as you only use ids as keys and only use these operations: y = memo.get(id(x), default) memo[id(x)] = y memo.keep_alive(x) """ def keep_alive(self, x): self.setdefault(id(self), []).append(x) def reconstruct(x, memo: Memo, reduction, *, recurse=deepcopy): """reconstruct(x, memo, reduction, recurse=recursive_walker) Constructs a new object from the reduction by calling recursive_walker on each value. The reduction should have been obtained as pickle.reduce(x) and the memo should be a Memo instance (which will be passed to each recursive_walker call). """ return _reconstruct(x, memo, *reduction, deepcopy=recurse) def copier(cls): """copier(cls) -> func Returns a function func(x, memo, recurse) that can be used to copy objects of type cls without reducing and reconstructing them, or None if there is no such function. """ if c := _deepcopy_dispatch.get(cls): return c if issubclass(cls, type): return _deepcopy_atomic Also, all of the private functions that are stored in `_deepcopy_dispatch` would rename their `deepcopy` parameter to `recurse`, and the two that don't have such a parameter would add it. [1]: https://mail.python.org/archives/list/python-ideas@python.org/thread/RTZGM7L...

I don't have anything to add on the side of the technical specifics, but I do want to point out another benefit of adding these APIs: a significant improvement in testability, since you will be able to test the reduction and reconstruction behaviours directly, rather than having to infer what is going on from the round trip behavior. Cheers, Nick.

Some additional things that might be worth doing. I believe this exposes enough to allow people to build an object graph walker out of the `pickle`/`copy` protocol without having to access fragile internals, and without restricting future evolution of the internals of the protocol. See [the same -ideas thread][1] again for details on how it could be used and why. These would all be changes to the `copy` module, together with the changes to `pickle` and `copyreg` in the previous message: class Memo(dict): """Memo is a mapping that can be used to do memoization exactly the same way deepcopy does, so long as you only use ids as keys and only use these operations: y = memo.get(id(x), default) memo[id(x)] = y memo.keep_alive(x) """ def keep_alive(self, x): self.setdefault(id(self), []).append(x) def reconstruct(x, memo: Memo, reduction, *, recurse=deepcopy): """reconstruct(x, memo, reduction, recurse=recursive_walker) Constructs a new object from the reduction by calling recursive_walker on each value. The reduction should have been obtained as pickle.reduce(x) and the memo should be a Memo instance (which will be passed to each recursive_walker call). """ return _reconstruct(x, memo, *reduction, deepcopy=recurse) def copier(cls): """copier(cls) -> func Returns a function func(x, memo, recurse) that can be used to copy objects of type cls without reducing and reconstructing them, or None if there is no such function. """ if c := _deepcopy_dispatch.get(cls): return c if issubclass(cls, type): return _deepcopy_atomic Also, all of the private functions that are stored in `_deepcopy_dispatch` would rename their `deepcopy` parameter to `recurse`, and the two that don't have such a parameter would add it. [1]: https://mail.python.org/archives/list/python-ideas@python.org/thread/RTZGM7L...

I don't have anything to add on the side of the technical specifics, but I do want to point out another benefit of adding these APIs: a significant improvement in testability, since you will be able to test the reduction and reconstruction behaviours directly, rather than having to infer what is going on from the round trip behavior. Cheers, Nick.
participants (2)
-
Andrew Barnert
-
Nick Coghlan