
On Thu, 29 Mar 2018 01:40:17 +0000 Robert Collins <robertc@robertcollins.net> wrote:
Data sharing ------------
If you pickle and then unpickle an object in the same process, passing out-of-band buffer views, then the unpickled object may be backed by the same buffer as the original pickled object.
For example, it might be reasonable to implement reduction of a Numpy array as follows (crucial metadata such as shapes is omitted for simplicity)::
class ndarray:
def __reduce_ex__(self, protocol): if protocol == 5: return numpy.frombuffer, (PickleBuffer(self), self.dtype) # Legacy code for earlier protocols omitted
Then simply passing the PickleBuffer around from ``dumps`` to ``loads`` will produce a new Numpy array sharing the same underlying memory as the original Numpy object (and, incidentally, keeping it alive)::
This seems incompatible with v4 semantics. There, a loads plus dumps combination is approximately a deep copy. This isn't. Sometimes. Sometimes it is.
True. But it's only incompatible if you pass the new ``buffer_callback`` and ``buffers`` arguments. If you don't, then you always get a copy. This is something that consumers should keep in mind. Note there's a movement towards immutable data. For example, Dask arrays and Arrow arrays are designed as immutable. Regards Antoine.