Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

March 29, 2018

      On Thu, 29 Mar 2018 01:40:17 +0000
Robert Collins <robertc@robertcollins.net> wrote:
...
...
Data sharing
------------
If you pickle and then unpickle an object in the same process, passing
out-of-band buffer views, then the unpickled object may be backed by the
same buffer as the original pickled object.
For example, it might be reasonable to implement reduction of a Numpy array
as follows (crucial metadata such as shapes is omitted for simplicity)::
class ndarray:
def __reduce_ex__(self, protocol):
         if protocol == 5:
            return numpy.frombuffer, (PickleBuffer(self), self.dtype)
         # Legacy code for earlier protocols omitted
Then simply passing the PickleBuffer around from ``dumps`` to ``loads``
will produce a new Numpy array sharing the same underlying memory as the
original Numpy object (and, incidentally, keeping it alive)::
This seems incompatible with v4 semantics. There, a loads plus dumps
combination is approximately a deep copy. This isn't. Sometimes. Sometimes
it is.
True.  But it's only incompatible if you pass the new
``buffer_callback`` and ``buffers`` arguments.  If you don't, then you
always get a copy.  This is something that consumers should keep in
mind.

Note there's a movement towards immutable data. For example, Dask
arrays and Arrow arrays are designed as immutable.

Regards

Antoine.

Re: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

Antoine Pitrou