[Python-ideas] Re: pickle.reduce and deconstruct

Feb. 8, 2020

      On Fri, Feb 7, 2020 at 1:51 PM Andrew Barnert <abarnert@yahoo.com> wrote:
...
Not to argue against my own proposal here, but…i was only suggesting
exposing the reduction machinery, and I think you also need the
graph-walking machinery exposed. Or would my existing proposal on its own
actually be enough for you?
Ah, good point- indeed, your existing proposal, to just expose a shallow
reduce/deconstruct, would not be sufficient to support my project. However,
I don't want to hijack your proposal; I do think a shallow form of this
would be valuable and less controversial than full traversal and reduction
of any object graph, and if that could make it in python-dev as Antoine
suggests, it's worth doing.

That being said, if you have a use for the object graph walk, that
...
shouldn’t be too hard to write. It’s just that there are nontrivial design
issues to answer. Do we need to expose the graph itself in some form, or
just the memoized walk iterator? Is an Iterator the right way to expose it,
as opposed to something like the Smalltalk-ish walker classes in the AST
module?
(See my comments further down for more on this.) An entire graph structure
isn't necessary for my use case. An iterator over values would be
insufficient, because it wouldn't assist you in building up the replacement
structure. An iterator over node objects could work but it might be weird
to update them as you iterate. I think a walker is probably most
appropriate.

Should it iterate trivial objects (the core builtin types that pickle knows
...
it doesn’t need to further reduce—but you can get their reduction anyway if
you want)? Do we need to expose a function to let user code check whether
an object is trivial in this sense?
So, my solution to this was a big hack – you pickle and unpickle *twice*. It
would be great if a hack like that weren't necessary. So I'd be in favor of
an option to visit trivial objects.

To explain the hack used: When pickling an object graph containing trivial
values, you can't affect how the trivial values get reduced. However, you
*can* affect how they get *unpickled*, by overriding find_class() on a
custom Unpickler subclass. And so when you unpickle, you replace the
trivial value with an instance of a special *subclass* of the trivial
value's type. Then when you go to pickle for the *second* time, you can
apply your custom transformation because the value no longer gets skipped
by the traversal machinery- its type is no longer the original trivial
type, but a replica, "mock" subclass of it, with its own overridden version
of __reduce__().
You can see it in action, transforming an int into a string here:
https://gist.github.com/spitz-dan-l/3375a61fac6fe150574fac791567af4f#file-gr...
.
...
Should it Iterate the objects themselves, or some kind of node object?
Does it need to (optionally?) yield anything for references to
already-visited nodes—and, if so, what does that look like?
The way my library works is by ignoring the idea of traversal completely,
and merely specifying which (shallow) transformations to apply to which
types within an object graph. The traversal order is assumed to be
"undefined" – as long as pickle visits everything, it doesn't matter what
the order is. Objects in cycles or with multiple references to them are
assumed to be visited only once, just like those objects only get reduced
once during pickling

I think abstractly what my library is doing is walking the nodes of the
object graph (in an arbitrary, duplicate-free order), and doing replace
operations on some of the nodes (according to their value's type) while
preserving the rest of the deep structure. Deeper changes in structure are
achieved by applying successive full-graph transformations iteratively (see
the usage of .transform_iterative() in the demo e.g. here
https://gist.github.com/spitz-dan-l/3375a61fac6fe150574fac791567af4f#file-gr...).
An AST walker, with the ability to register custom visitors and
transformers, seems like the closer abstraction.

Should pickle and deepcopy be rewritten to use the public walker, or should
...
they continue to inline the walk throughout their code and the walker is
just a parallel implementation of the same logic—and, if the latter, do we
need to guarantee that it always implements identical logic, or just “good“
logic?
I think it would be great if pickle's traversal logic were exposed
separately from pickle itself, but I imagine decoupling it in that way
would put unnecessary pressure on python devs to maintain a larger surface
area of public functionality. (I'm not at all familiar with the existing
code used for traversal, though I'm curious to read it now.) So I would opt
for the second option- a parallel implementation that does its best to keep
up with pickle's behavior. I would try to keep some of the behavior
"undefined", like traversal order, so that users don't come to depend on a
particular order if that ever changes.
...
It’s also possible that the some of these questions have answers that
aren’t obvious from first principles but would become obvious as soon as we
start implementing it. And this might be a fun weekend project, so maybe
it’s worth just diving in to see what we run into, and then just ask -ideas
about the remaining open questions. If you don’t want to try it, there’s a
good chance I will, but no promises as to when.
I agree it would be a fun weekend project :). However, my next few weekends
are not available. If you're interested in coordinating on it, let me know
and perhaps we can find a time to get started. You're also welcome to jump
right in without me if you have the time and motivation.

Daniel Spitz