[Numpy-discussion] Preserving NumPy views when pickling

Nathaniel Smith njs at pobox.com
Tue Oct 25 23:36:29 EDT 2016


On Tue, Oct 25, 2016 at 5:09 PM, Matthew Harrigan
<harrigan.matthew at gmail.com> wrote:
> It seems pickle keeps track of references for basic python types.
>
> x = [1]
> y = [x]
> x,y = pickle.loads(pickle.dumps((x,y)))
> x.append(2)
> print(y)
>>>> [[1,2]]

Yes, but the problem is: suppose I have a 10 gigabyte array, and then
take a 20 byte slice of it, and then pickle that slice. Do you expect
the pickle file to be 20 bytes, or 10 gigabytes? Both options are
possible, but you have to pick one, and numpy picks 20 bytes. The
advantage is obviously that you don't have mysterious 10 gigabyte
pickle files; the disadvantage is that you can't reconstruct the view
relationships afterwards. (You might think: oh, but we can be clever,
and only record the view relationships if the user pickles both
objects together. But while pickle might know whether the user is
pickling both objects together, it unfortunately doesn't tell numpy,
so we can't really do anything clever or different in this case.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org



More information about the NumPy-Discussion mailing list