pickle.reduce and deconstruct
Pickling uses an extensible protocol that lets any class determine how its instances can be deconstructed and reconstructed. Both `pickle` and `copy` use this protocol, but it could be useful more generally. Unfortunately, to use it more generally requires relying on undocumented details. I think we should expose a couple of helpers to fix that: # Return the same (shallow) reduction tuple that pickle.py, copy.py, and _pickle.c would use pickle.reduce(obj) -> (callable, args[, state[, litems[, ditem[, statefunc]]]]) # Return a callable and arguments to construct a (shallow) equivalent object # Raise a TypeError when that isn't possible pickle.deconstruct(obj) -> callable, args, kw So, why do you want these? There are many cases where you want to "deconstruct" an object if possible. Pattern matching depends on being able to deconstruct objects like this. Auto-generating a `__repr__` as suggested in Chris's thread. Quick&dirty REPL stuff, and deeper reflection stuff using `inspect.Signature` and friends. Of course not every type tells `pickle` what to do in an appropriate way that we can use, but a pretty broad range of types do, including (I think; I haven't double-checked all of them) `@dataclass`, `namedtuple`, `@attr.s`, many builtin and extension types, almost all reasonable types that use `copyreg`, and any class that pickles via the simplest customization hook `__getnewargs[_ex]__`. That's more than enough to be useful. And, just as important, it won't (except in intentionally pathological cases) give us a false positive, where a type is correctly pickleable and we think we can deconstruct it but the deconstruction is wrong. (For some uses, you are going to want to fall back to heuristics that are often right but sometimes misleadingly wrong, but I don't think the `pickle` module should offer anything like that. Maybe `inspect` should.) The way to get the necessary information isn't fully documented, and neither is the way to interpret it. And I don't think it _should_ be documented, because it changes every so often, and for good reasons; we don't want anyone writing third-party code that relies on those details. Plus, a different Python implementation might conceivably do it differently. Public helpers exposed from `pickle` itself won't have those problems. Here's a first take at the code. def reduce(obj, proto=pickle.DEFAULT_PROTOCOL): """reduce(obj) -> (callable, args[, state[, litems[, ditem[, statefunc]]]]) Return the same reduction tuple that the pickle and copy modules use """ cls = type(obj) if reductor := copyreg.dispatch_table.get(cls): return reductor(obj) # Note that this is not a special method call (not looked up on the type) if reductor := getattr(obj, "__reduce_ex__"): return reductor(proto) if reductor := getattr(obj, "__reduce__"): return reductor() raise TypeError(f"{cls.__name__} objects are not reducible") def deconstruct(obj): """deconstruct(obj) -> callable, args, kw callable(*args, **kw) will construct an equivalent object """ reduction = reduce(obj) # If any of the optional members are included, pickle/copy has to # modify the object after construction, so there is no useful single # call we can deconstruct to. if any(reduction[2:]): raise TypeError(f"{type(obj).__name__} objects are not deconstrutable") func, args, *_ = reduction # Most types (including @dataclass, namedtuple, and many builtins) # use copyreg.__newobj__ as the constructor func. The args tuple is # the type (or, when appropriate, some other registered # constructor) followed by the actual args. However, any function # with the same name will be treated the same way (because under the # covers, this is optimized to a special opcode). if func.__name__ == "__newobj__": return args[0], args[1:], {} # Mainly only used by types that implement __getnewargs_ex__ use # copyreg.__newobj_ex__ as the constructor func. The args tuple # holds the type, *args tuple, and **kwargs dict. Again, this is # special-cased by name. if func.__name__ == "__newobj_ex__": return args # If any other special copyreg functions are added in the future, # this code won't know how to handle them, so bail. if func.__module__ == 'copyreg': raise TypeError(f"{type(obj).__name__} objects are not deconstrutable") # Otherwise, the type implements a custom __reduce__ or __reduce_ex__, # and whatever it specifies as the constructor is the real constructor. return func, args, {} Actually looking at that code, I think it makes a better argument for why we don't want to make all the internal details public. :) Here are some quick (completely untested) examples of other things we could build on it. # in inspect def deconstruct(obj): """deconstruct(obj) -> callable, bound_args Calling the callable on the bound_args would construct an equivalent object """ func, args, kw = pickle.deconstruct(obj) sig = inspect.signature(func) return func, sig.bind(*args, **kw) # in reprlib, for your __repr__ to delegate to def auto_repr(obj): func, bound_args = inspect.deconstruct(obj) args = itertools.chain( map(repr, bound_args.args), (f"{key!r}={value!r}" for key, value in bound_args.kwargs.items())) return f"{func.__name__}({', '.join(args)})" # or maybe as a class decorator def auto_repr(cls): def __repr__(self): func, bound_args = inspect.deconstruct(self) args = itertools.chain( map(repr, bound_args.args), (f"{key!r}={value!r}" for key, value in bound_args.kwargs.items())) return f"{func.__name__}({', '.join(args)})" cls.__repr__ = __repr__ return cls
I just want to say, this would be extremely powerful. Awhile ago I implemented a demo that wraps the pickle extension machinery to enable "object graph transformers": https://gist.github.com/spitz-dan-l/3375a61fac6fe150574fac791567af4f (demo usage starts at line 307). Effectively it lets you match-and-transform arbitrary python object graphs that can include cycles and deep nesting, without writing any traversal logic yourself. In the demo I used it to implement a simple arithmetic expression parser/evaluator, that lets you go from strings like "3 + (4 + (5) + 7)" to the int 19. But, it uses pickle under the hood, because pickle is what knows how to traverse object graphs with cycles, and pickle is what knows how to use the extensible __reduce__ machinery. And this makes the entire demo silly and really slow. (And, that was kind of the point, to do something interesting using esoteric, inappropriate corners of python.) If Andrew's proposal happened, pickle would be uncoupled from that core machinery, and my idea, as well as other interesting ideas, could be implemented in a non-silly way. Daniel Spitz On Thu, Jan 23, 2020 at 7:11 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
Pickling uses an extensible protocol that lets any class determine how its instances can be deconstructed and reconstructed. Both `pickle` and `copy` use this protocol, but it could be useful more generally. Unfortunately, to use it more generally requires relying on undocumented details. I think we should expose a couple of helpers to fix that:
# Return the same (shallow) reduction tuple that pickle.py, copy.py, and _pickle.c would use pickle.reduce(obj) -> (callable, args[, state[, litems[, ditem[, statefunc]]]])
# Return a callable and arguments to construct a (shallow) equivalent object # Raise a TypeError when that isn't possible pickle.deconstruct(obj) -> callable, args, kw
So, why do you want these?
There are many cases where you want to "deconstruct" an object if possible. Pattern matching depends on being able to deconstruct objects like this. Auto-generating a `__repr__` as suggested in Chris's thread. Quick&dirty REPL stuff, and deeper reflection stuff using `inspect.Signature` and friends.
Of course not every type tells `pickle` what to do in an appropriate way that we can use, but a pretty broad range of types do, including (I think; I haven't double-checked all of them) `@dataclass`, `namedtuple`, `@attr.s`, many builtin and extension types, almost all reasonable types that use `copyreg`, and any class that pickles via the simplest customization hook `__getnewargs[_ex]__`. That's more than enough to be useful. And, just as important, it won't (except in intentionally pathological cases) give us a false positive, where a type is correctly pickleable and we think we can deconstruct it but the deconstruction is wrong. (For some uses, you are going to want to fall back to heuristics that are often right but sometimes misleadingly wrong, but I don't think the `pickle` module should offer anything like that. Maybe `inspect` should.)
The way to get the necessary information isn't fully documented, and neither is the way to interpret it. And I don't think it _should_ be documented, because it changes every so often, and for good reasons; we don't want anyone writing third-party code that relies on those details. Plus, a different Python implementation might conceivably do it differently. Public helpers exposed from `pickle` itself won't have those problems.
Here's a first take at the code.
def reduce(obj, proto=pickle.DEFAULT_PROTOCOL): """reduce(obj) -> (callable, args[, state[, litems[, ditem[, statefunc]]]]) Return the same reduction tuple that the pickle and copy modules use """ cls = type(obj) if reductor := copyreg.dispatch_table.get(cls): return reductor(obj) # Note that this is not a special method call (not looked up on the type) if reductor := getattr(obj, "__reduce_ex__"): return reductor(proto) if reductor := getattr(obj, "__reduce__"): return reductor() raise TypeError(f"{cls.__name__} objects are not reducible")
def deconstruct(obj): """deconstruct(obj) -> callable, args, kw callable(*args, **kw) will construct an equivalent object """ reduction = reduce(obj) # If any of the optional members are included, pickle/copy has to # modify the object after construction, so there is no useful single # call we can deconstruct to. if any(reduction[2:]): raise TypeError(f"{type(obj).__name__} objects are not deconstrutable") func, args, *_ = reduction # Most types (including @dataclass, namedtuple, and many builtins) # use copyreg.__newobj__ as the constructor func. The args tuple is # the type (or, when appropriate, some other registered # constructor) followed by the actual args. However, any function # with the same name will be treated the same way (because under the # covers, this is optimized to a special opcode). if func.__name__ == "__newobj__": return args[0], args[1:], {} # Mainly only used by types that implement __getnewargs_ex__ use # copyreg.__newobj_ex__ as the constructor func. The args tuple # holds the type, *args tuple, and **kwargs dict. Again, this is # special-cased by name. if func.__name__ == "__newobj_ex__": return args # If any other special copyreg functions are added in the future, # this code won't know how to handle them, so bail. if func.__module__ == 'copyreg': raise TypeError(f"{type(obj).__name__} objects are not deconstrutable") # Otherwise, the type implements a custom __reduce__ or __reduce_ex__, # and whatever it specifies as the constructor is the real constructor. return func, args, {}
Actually looking at that code, I think it makes a better argument for why we don't want to make all the internal details public. :)
Here are some quick (completely untested) examples of other things we could build on it.
# in inspect def deconstruct(obj): """deconstruct(obj) -> callable, bound_args Calling the callable on the bound_args would construct an equivalent object """ func, args, kw = pickle.deconstruct(obj) sig = inspect.signature(func) return func, sig.bind(*args, **kw)
# in reprlib, for your __repr__ to delegate to def auto_repr(obj): func, bound_args = inspect.deconstruct(obj) args = itertools.chain( map(repr, bound_args.args), (f"{key!r}={value!r}" for key, value in bound_args.kwargs.items())) return f"{func.__name__}({', '.join(args)})"
# or maybe as a class decorator def auto_repr(cls): def __repr__(self): func, bound_args = inspect.deconstruct(self) args = itertools.chain( map(repr, bound_args.args), (f"{key!r}={value!r}" for key, value in bound_args.kwargs.items())) return f"{func.__name__}({', '.join(args)})" cls.__repr__ = __repr__ return cls
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RTZGM7... Code of Conduct: http://python.org/psf/codeofconduct/
I hadn't seen the original message (perhaps it fell through the GMane migration). I think this would indeed be a worthwhile addition to pickle. Is it possible to continue this discussion on python-dev? I don't think this is a PEP-level topic. Regards Antoine. On Fri, 7 Feb 2020 12:48:48 -0500 Daniel Spitz <spitz.dan.l@gmail.com> wrote:
I just want to say, this would be extremely powerful.
Awhile ago I implemented a demo that wraps the pickle extension machinery to enable "object graph transformers": https://gist.github.com/spitz-dan-l/3375a61fac6fe150574fac791567af4f (demo usage starts at line 307). Effectively it lets you match-and-transform arbitrary python object graphs that can include cycles and deep nesting, without writing any traversal logic yourself. In the demo I used it to implement a simple arithmetic expression parser/evaluator, that lets you go from strings like "3 + (4 + (5) + 7)" to the int 19.
But, it uses pickle under the hood, because pickle is what knows how to traverse object graphs with cycles, and pickle is what knows how to use the extensible __reduce__ machinery. And this makes the entire demo silly and really slow. (And, that was kind of the point, to do something interesting using esoteric, inappropriate corners of python.)
If Andrew's proposal happened, pickle would be uncoupled from that core machinery, and my idea, as well as other interesting ideas, could be implemented in a non-silly way.
Daniel Spitz
On Thu, Jan 23, 2020 at 7:11 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
Pickling uses an extensible protocol that lets any class determine how its instances can be deconstructed and reconstructed. Both `pickle` and `copy` use this protocol, but it could be useful more generally. Unfortunately, to use it more generally requires relying on undocumented details. I think we should expose a couple of helpers to fix that:
# Return the same (shallow) reduction tuple that pickle.py, copy.py, and _pickle.c would use pickle.reduce(obj) -> (callable, args[, state[, litems[, ditem[, statefunc]]]])
# Return a callable and arguments to construct a (shallow) equivalent object # Raise a TypeError when that isn't possible pickle.deconstruct(obj) -> callable, args, kw
So, why do you want these?
There are many cases where you want to "deconstruct" an object if possible. Pattern matching depends on being able to deconstruct objects like this. Auto-generating a `__repr__` as suggested in Chris's thread. Quick&dirty REPL stuff, and deeper reflection stuff using `inspect.Signature` and friends.
Of course not every type tells `pickle` what to do in an appropriate way that we can use, but a pretty broad range of types do, including (I think; I haven't double-checked all of them) `@dataclass`, `namedtuple`, `@attr.s`, many builtin and extension types, almost all reasonable types that use `copyreg`, and any class that pickles via the simplest customization hook `__getnewargs[_ex]__`. That's more than enough to be useful. And, just as important, it won't (except in intentionally pathological cases) give us a false positive, where a type is correctly pickleable and we think we can deconstruct it but the deconstruction is wrong. (For some uses, you are going to want to fall back to heuristics that are often right but sometimes misleadingly wrong, but I don't think the `pickle` module should offer anything like that. Maybe `inspect` should.)
The way to get the necessary information isn't fully documented, and neither is the way to interpret it. And I don't think it _should_ be documented, because it changes every so often, and for good reasons; we don't want anyone writing third-party code that relies on those details. Plus, a different Python implementation might conceivably do it differently. Public helpers exposed from `pickle` itself won't have those problems.
Here's a first take at the code.
def reduce(obj, proto=pickle.DEFAULT_PROTOCOL): """reduce(obj) -> (callable, args[, state[, litems[, ditem[, statefunc]]]]) Return the same reduction tuple that the pickle and copy modules use """ cls = type(obj) if reductor := copyreg.dispatch_table.get(cls): return reductor(obj) # Note that this is not a special method call (not looked up on the type) if reductor := getattr(obj, "__reduce_ex__"): return reductor(proto) if reductor := getattr(obj, "__reduce__"): return reductor() raise TypeError(f"{cls.__name__} objects are not reducible")
def deconstruct(obj): """deconstruct(obj) -> callable, args, kw callable(*args, **kw) will construct an equivalent object """ reduction = reduce(obj) # If any of the optional members are included, pickle/copy has to # modify the object after construction, so there is no useful single # call we can deconstruct to. if any(reduction[2:]): raise TypeError(f"{type(obj).__name__} objects are not deconstrutable") func, args, *_ = reduction # Most types (including @dataclass, namedtuple, and many builtins) # use copyreg.__newobj__ as the constructor func. The args tuple is # the type (or, when appropriate, some other registered # constructor) followed by the actual args. However, any function # with the same name will be treated the same way (because under the # covers, this is optimized to a special opcode). if func.__name__ == "__newobj__": return args[0], args[1:], {} # Mainly only used by types that implement __getnewargs_ex__ use # copyreg.__newobj_ex__ as the constructor func. The args tuple # holds the type, *args tuple, and **kwargs dict. Again, this is # special-cased by name. if func.__name__ == "__newobj_ex__": return args # If any other special copyreg functions are added in the future, # this code won't know how to handle them, so bail. if func.__module__ == 'copyreg': raise TypeError(f"{type(obj).__name__} objects are not deconstrutable") # Otherwise, the type implements a custom __reduce__ or __reduce_ex__, # and whatever it specifies as the constructor is the real constructor. return func, args, {}
Actually looking at that code, I think it makes a better argument for why we don't want to make all the internal details public. :)
Here are some quick (completely untested) examples of other things we could build on it.
# in inspect def deconstruct(obj): """deconstruct(obj) -> callable, bound_args Calling the callable on the bound_args would construct an equivalent object """ func, args, kw = pickle.deconstruct(obj) sig = inspect.signature(func) return func, sig.bind(*args, **kw)
# in reprlib, for your __repr__ to delegate to def auto_repr(obj): func, bound_args = inspect.deconstruct(obj) args = itertools.chain( map(repr, bound_args.args), (f"{key!r}={value!r}" for key, value in bound_args.kwargs.items())) return f"{func.__name__}({', '.join(args)})"
# or maybe as a class decorator def auto_repr(cls): def __repr__(self): func, bound_args = inspect.deconstruct(self) args = itertools.chain( map(repr, bound_args.args), (f"{key!r}={value!r}" for key, value in bound_args.kwargs.items())) return f"{func.__name__}({', '.join(args)})" cls.__repr__ = __repr__ return cls
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RTZGM7... Code of Conduct: http://python.org/psf/codeofconduct/
On Feb 7, 2020, at 09:49, Daniel Spitz <spitz.dan.l@gmail.com> wrote:
But, it uses pickle under the hood, because pickle is what knows how to traverse object graphs with cycles, and pickle is what knows how to use the extensible __reduce__ machinery. And this makes the entire demo silly and really slow. (And, that was kind of the point, to do something interesting using esoteric, inappropriate corners of python.)
If Andrew's proposal happened, pickle would be uncoupled from that core machinery, and my idea, as well as other interesting ideas, could be implemented in a non-silly way.
Not to argue against my own proposal here, but…i was only suggesting exposing the reduction machinery, and I think you also need the graph-walking machinery exposed. Or would my existing proposal on its own actually be enough for you? That being said, if you have a use for the object graph walk, that shouldn’t be too hard to write. It’s just that there are nontrivial design issues to answer. Do we need to expose the graph itself in some form, or just the memoized walk iterator? Is an Iterator the right way to expose it, as opposed to something like the Smalltalk-ish walker classes in the AST module? Should it iterate trivial objects (the core builtin types that pickle knows it doesn’t need to further reduce—but you can get their reduction anyway if you want)? Do we need to expose a function to let user code check whether an object is trivial in this sense? Should it Iterate the objects themselves, or some kind of node object? Does it need to (optionally?) yield anything for references to already-visited nodes—and, if so, what does that look like? Should pickle and deepcopy be rewritten to use the public walker, or should they continue to inline the walk throughout their code and the walker is just a parallel implementation of the same logic—and, if the latter, do we need to guarantee that it always implements identical logic, or just “good“ logic? I took only the briefest look at your code (I’m on my phone); it’s possible that some of these already have obvious “it has to be this answer or I can’t write my transformer” answers. It’s also possible that the some of these questions have answers that aren’t obvious from first principles but would become obvious as soon as we start implementing it. And this might be a fun weekend project, so maybe it’s worth just diving in to see what we run into, and then just ask -ideas about the remaining open questions. If you don’t want to try it, there’s a good chance I will, but no promises as to when.
On Fri, Feb 7, 2020 at 1:51 PM Andrew Barnert <abarnert@yahoo.com> wrote:
Not to argue against my own proposal here, but…i was only suggesting exposing the reduction machinery, and I think you also need the graph-walking machinery exposed. Or would my existing proposal on its own actually be enough for you?
Ah, good point- indeed, your existing proposal, to just expose a shallow reduce/deconstruct, would not be sufficient to support my project. However, I don't want to hijack your proposal; I do think a shallow form of this would be valuable and less controversial than full traversal and reduction of any object graph, and if that could make it in python-dev as Antoine suggests, it's worth doing. That being said, if you have a use for the object graph walk, that
shouldn’t be too hard to write. It’s just that there are nontrivial design issues to answer. Do we need to expose the graph itself in some form, or just the memoized walk iterator? Is an Iterator the right way to expose it, as opposed to something like the Smalltalk-ish walker classes in the AST module?
(See my comments further down for more on this.) An entire graph structure isn't necessary for my use case. An iterator over values would be insufficient, because it wouldn't assist you in building up the replacement structure. An iterator over node objects could work but it might be weird to update them as you iterate. I think a walker is probably most appropriate. Should it iterate trivial objects (the core builtin types that pickle knows
it doesn’t need to further reduce—but you can get their reduction anyway if you want)? Do we need to expose a function to let user code check whether an object is trivial in this sense?
So, my solution to this was a big hack – you pickle and unpickle *twice*. It would be great if a hack like that weren't necessary. So I'd be in favor of an option to visit trivial objects. To explain the hack used: When pickling an object graph containing trivial values, you can't affect how the trivial values get reduced. However, you *can* affect how they get *unpickled*, by overriding find_class() on a custom Unpickler subclass. And so when you unpickle, you replace the trivial value with an instance of a special *subclass* of the trivial value's type. Then when you go to pickle for the *second* time, you can apply your custom transformation because the value no longer gets skipped by the traversal machinery- its type is no longer the original trivial type, but a replica, "mock" subclass of it, with its own overridden version of __reduce__(). You can see it in action, transforming an int into a string here: https://gist.github.com/spitz-dan-l/3375a61fac6fe150574fac791567af4f#file-gr... .
Should it Iterate the objects themselves, or some kind of node object? Does it need to (optionally?) yield anything for references to already-visited nodes—and, if so, what does that look like?
The way my library works is by ignoring the idea of traversal completely, and merely specifying which (shallow) transformations to apply to which types within an object graph. The traversal order is assumed to be "undefined" – as long as pickle visits everything, it doesn't matter what the order is. Objects in cycles or with multiple references to them are assumed to be visited only once, just like those objects only get reduced once during pickling I think abstractly what my library is doing is walking the nodes of the object graph (in an arbitrary, duplicate-free order), and doing replace operations on some of the nodes (according to their value's type) while preserving the rest of the deep structure. Deeper changes in structure are achieved by applying successive full-graph transformations iteratively (see the usage of .transform_iterative() in the demo e.g. here https://gist.github.com/spitz-dan-l/3375a61fac6fe150574fac791567af4f#file-gr...). An AST walker, with the ability to register custom visitors and transformers, seems like the closer abstraction. Should pickle and deepcopy be rewritten to use the public walker, or should
they continue to inline the walk throughout their code and the walker is just a parallel implementation of the same logic—and, if the latter, do we need to guarantee that it always implements identical logic, or just “good“ logic?
I think it would be great if pickle's traversal logic were exposed separately from pickle itself, but I imagine decoupling it in that way would put unnecessary pressure on python devs to maintain a larger surface area of public functionality. (I'm not at all familiar with the existing code used for traversal, though I'm curious to read it now.) So I would opt for the second option- a parallel implementation that does its best to keep up with pickle's behavior. I would try to keep some of the behavior "undefined", like traversal order, so that users don't come to depend on a particular order if that ever changes.
It’s also possible that the some of these questions have answers that aren’t obvious from first principles but would become obvious as soon as we start implementing it. And this might be a fun weekend project, so maybe it’s worth just diving in to see what we run into, and then just ask -ideas about the remaining open questions. If you don’t want to try it, there’s a good chance I will, but no promises as to when.
I agree it would be a fun weekend project :). However, my next few weekends are not available. If you're interested in coordinating on it, let me know and perhaps we can find a time to get started. You're also welcome to jump right in without me if you have the time and motivation. Daniel Spitz
Daniel Spitz wrote:
On Fri, Feb 7, 2020 at 1:51 PM Andrew Barnert abarnert@yahoo.com wrote:
That being said, if you have a use for the object graph walk, that shouldn’t be too hard to write. It’s just that there are nontrivial design issues to answer. Do we need to expose the graph itself in some form, or just the memoized walk iterator? Is an Iterator the right way to expose it, as opposed to something like the Smalltalk-ish walker classes in the AST module? (See my comments further down for more on this.) An entire graph structure isn't necessary for my use case. An iterator over values would be insufficient, because it wouldn't assist you in building up the replacement structure. An iterator over node objects could work but it might be weird to update them as you iterate. I think a walker is probably most appropriate.
I think the simplest interface might just look like `copy.deepcopy` but with an extra `transformer` argument, which is called on each value. The `ast.NodeTransformer` style doesn't really work, because you have to register transformers for an arbitrary set of types that probably need qualified names (or might not even have public names), rather than for a fixed set of types from the `ast` module. So you're going to need a single `transform` method that explicitly switches on types (or, maybe better, uses `@singledispatch` to do it for you), at which point the class is just extra boilerplate. The first question is, should this be top-down (like doing `copy.deepcopy`, but calling `transformer` on each original value before recursively deep-copying it) or bottom-up (like doing `pickle.loads(pickle.dumps())`, but calling `transformer` on each reconstructed value)? I think bottom-up is more generally useful, but there are some cases where it seems wrong. For example, if you want to transform every `Spam` into an `Eggs`, and `Spam` isn't reducible, top-down could still easily work, but bottom-up wouldn't. I suppose it could be an option (like `os.walk`), or even give you both (like `ElementTree.iterparse`)? Also, it seems a bit weird that types that are deepcopyable with `__deepcopy__` but not pickleable—or that are both, but `__deepcopy__` is a major optimization—can't use `__deepcopy__`. But I'm not sure there's any way we could use that protocol. The `__deepcopy__` method just takes `self, memo` and usually works by calling `copy.deepcopy(attr, memo)` on all of its attributes, and there's no obvious clean way to swizzle that into calling our transforming-deepcopy with our transformer instead. If it's important to make that work, then it's worth checking whether I'm right about that rather than just accepting that I didn't see an obvious clean way… One more question: You don't want to see duplicate values. If a value has been transformed, you want the replacement value each time it shows up again then? Or should a transformed value not go into the memo cache, so if the same value appears again it can be transformed differently? Or does that also need to be an option? Finally, I think we'd want the `deepcopy` style of memoization, where we just rely on `id` to be persistent through the entire copy (which includes keeping alive any values that might get replaced along the way), not the `pickle` style that has to construct and using persistent IDs. So we don't need to use any of the pickle ID machinery. Anyway, if you wanted to implement this today, you'd need access to multiple (and undocumented, and periodically changing) things. But I think all we'd need `copy` to expose to make this doable without being fragile is: def reconstruct(x, memo, reduction, *, recurse=deepcopy): return _reconstruct(x, memo, *reduction, deepcopy=recurse) def copier(cls): if copier := _deepcopy_dispatch.get(cls): return copier if issubclass(cls, type): return _deepcopy_atomic And add a `deepcopy=deepcopy` parameter to `_deepcopy_atomic` and `_deepcopy_method`, and the latter would have to use it instead of the global, just like the other copier functions. Or, maybe better, rename it to `recurse` in all of them. And with those changes to `copy`, plus the `pickle.reduce` function I suggested earlier, I believe you can write `transform` yourself like this (quick&dirty hack based on existing `copy.deepcopy`, which assumes you want bottom-up, and transformed values to be memoized, and no attempt at `__deepcopy__` support): def transform(x, memo=None, *, transformer, _nil=[]): if memo is None: memo = {} recurse = functools.partial(transform, transformer=transformer) d = id(x) y = memo.get(d, _nil) if y is not _nil: return y # y = transform(y) # top-down walk; untested cls = type(x) if copier := copy.copier(cls): y = copier(x, memo, recurse=recurse) else: rv = pickle.reduce(x) # not sure what this is for, but copy.deepcopy does it... if isinstance(rv, str): y = x else: y = copy.reconstruct(x, memo, rv, recurse=recurse) y = transformer(y) # bottom-up walk # If is its own copy, don't memoize. if y is not x: memo[d] = y memo.setdefault(id(memo), []).append(x) # make sure x lives at least as long as d return y I think this is sufficiently non-magical for someone to write as user/third-party-library code. And I don't think any of this puts unacceptable constraints on future versions of Python or other implementations. The one possible issue is the `memo` thing. What if a future version of `copy` wanted to use something different from a plain dict for memoization, or wanted to use the dict differently in a way that our `setdefault` trick would break? I don't think that's an issue, but if it is, one more thing to expose: class Memo(dict): def keep_alive(self, x): self.setdefault(id(self), []).append(x) And then the user `transform` function has to use `copy.Memo()` in place of `{}`, and call `memo.keep_alive(x)` instead of doing it manually. Here's an example of using it: @dataclass class Spam: x: int y: float z: list @functools.singledispatch def transformator(x): return x @transformator.register def _(x: int): return str(x) spam = Spam(2, 3.0, z=[1, 2.0, 'abc']) print(transform(spam, transformer=transformator)) This should print: Spam(x='2', y=3.0, z=['1', 2.0, 'abc'])
participants (3)
-
Andrew Barnert
-
Antoine Pitrou
-
Daniel Spitz