
On Fri, Feb 7, 2020 at 1:51 PM Andrew Barnert <abarnert@yahoo.com> wrote:
Not to argue against my own proposal here, but…i was only suggesting exposing the reduction machinery, and I think you also need the graph-walking machinery exposed. Or would my existing proposal on its own actually be enough for you?
Ah, good point- indeed, your existing proposal, to just expose a shallow reduce/deconstruct, would not be sufficient to support my project. However, I don't want to hijack your proposal; I do think a shallow form of this would be valuable and less controversial than full traversal and reduction of any object graph, and if that could make it in python-dev as Antoine suggests, it's worth doing. That being said, if you have a use for the object graph walk, that
shouldn’t be too hard to write. It’s just that there are nontrivial design issues to answer. Do we need to expose the graph itself in some form, or just the memoized walk iterator? Is an Iterator the right way to expose it, as opposed to something like the Smalltalk-ish walker classes in the AST module?
(See my comments further down for more on this.) An entire graph structure isn't necessary for my use case. An iterator over values would be insufficient, because it wouldn't assist you in building up the replacement structure. An iterator over node objects could work but it might be weird to update them as you iterate. I think a walker is probably most appropriate. Should it iterate trivial objects (the core builtin types that pickle knows
it doesn’t need to further reduce—but you can get their reduction anyway if you want)? Do we need to expose a function to let user code check whether an object is trivial in this sense?
So, my solution to this was a big hack – you pickle and unpickle *twice*. It would be great if a hack like that weren't necessary. So I'd be in favor of an option to visit trivial objects. To explain the hack used: When pickling an object graph containing trivial values, you can't affect how the trivial values get reduced. However, you *can* affect how they get *unpickled*, by overriding find_class() on a custom Unpickler subclass. And so when you unpickle, you replace the trivial value with an instance of a special *subclass* of the trivial value's type. Then when you go to pickle for the *second* time, you can apply your custom transformation because the value no longer gets skipped by the traversal machinery- its type is no longer the original trivial type, but a replica, "mock" subclass of it, with its own overridden version of __reduce__(). You can see it in action, transforming an int into a string here: https://gist.github.com/spitz-dan-l/3375a61fac6fe150574fac791567af4f#file-gr... .
Should it Iterate the objects themselves, or some kind of node object? Does it need to (optionally?) yield anything for references to already-visited nodes—and, if so, what does that look like?
The way my library works is by ignoring the idea of traversal completely, and merely specifying which (shallow) transformations to apply to which types within an object graph. The traversal order is assumed to be "undefined" – as long as pickle visits everything, it doesn't matter what the order is. Objects in cycles or with multiple references to them are assumed to be visited only once, just like those objects only get reduced once during pickling I think abstractly what my library is doing is walking the nodes of the object graph (in an arbitrary, duplicate-free order), and doing replace operations on some of the nodes (according to their value's type) while preserving the rest of the deep structure. Deeper changes in structure are achieved by applying successive full-graph transformations iteratively (see the usage of .transform_iterative() in the demo e.g. here https://gist.github.com/spitz-dan-l/3375a61fac6fe150574fac791567af4f#file-gr...). An AST walker, with the ability to register custom visitors and transformers, seems like the closer abstraction. Should pickle and deepcopy be rewritten to use the public walker, or should
they continue to inline the walk throughout their code and the walker is just a parallel implementation of the same logic—and, if the latter, do we need to guarantee that it always implements identical logic, or just “good“ logic?
I think it would be great if pickle's traversal logic were exposed separately from pickle itself, but I imagine decoupling it in that way would put unnecessary pressure on python devs to maintain a larger surface area of public functionality. (I'm not at all familiar with the existing code used for traversal, though I'm curious to read it now.) So I would opt for the second option- a parallel implementation that does its best to keep up with pickle's behavior. I would try to keep some of the behavior "undefined", like traversal order, so that users don't come to depend on a particular order if that ever changes.
It’s also possible that the some of these questions have answers that aren’t obvious from first principles but would become obvious as soon as we start implementing it. And this might be a fun weekend project, so maybe it’s worth just diving in to see what we run into, and then just ask -ideas about the remaining open questions. If you don’t want to try it, there’s a good chance I will, but no promises as to when.
I agree it would be a fun weekend project :). However, my next few weekends are not available. If you're interested in coordinating on it, let me know and perhaps we can find a time to get started. You're also welcome to jump right in without me if you have the time and motivation. Daniel Spitz