[Numpy-discussion] Pickling of memory aliasing patterns

Eelco Hoogendoorn hoogendoorn.eelco at gmail.com
Fri Feb 28 07:00:19 EST 2014

I have been working on a general function caching mechanism, and in doing 
so I stumbled upon the following quirck:

    def foo(a,b):
        b[0] = 1
        return a[0]

    a = np.zeros(1)
    b = a[:]
    print foo(a, b)    #computes and returns 1
    print foo(a, b)    #gets 1 from cache, as it should
    a = np.zeros(1) #no aliasing between inputs
    b = np.zeros(1)
    print foo(a, b)    #should compute and return 0 but instead gets 1 from 

Fundamentaly, this is because it turns out that the memory aliasing 
patterns that arrays may have are lost during pickling. This leads me to 
two questions:

1: Is this desirable behavior
2: is this preventable behavior?

It seems to me the answer to the first question is no, and the answer to 
the second question is yes.

Here is what I am using at the moment to generate a correct hash under such 
circumstances; but unpickling along these lines should be possible too, 
methinks. Or am I missing some subtlety as to why something along these 
lines couldn't be the default pickling behavior for numpy arrays?

class ndarray_own(object):
    def __init__(self, arr):
        self.buffer     = np.getbuffer(arr)
        self.dtype      = arr.dtype
        self.shape      = arr.shape
        self.strides    = arr.strides
class ndarray_view(object):
    def __init__(self, arr):
        self.base       = arr.base
        self.offset     = self.base.ctypes.data - arr.ctypes.data   #so we 
have a view; but where is it?
        self.dtype      = arr.dtype
        self.shape      = arr.shape
        self.strides    = arr.strides

class NumpyDeterministicPickler(DeterministicPickler):
    Special case for numpy.
    in general, external C objects may include internal state which does 
not serialize in a way we want it to
    ndarray memory aliasing is one of those things

    def save(self, obj):
        remap a numpy array to a representation which conserves
        all semantically relevant information concerning memory aliasing
        note that this mapping is 'destructive'; we will not get our 
original numpy arrays
        back after unpickling; not without custom deserialization code at 
        but we dont care, since this is only meant to be used to obtain 
correct keying behavior
        keys dont need to be deserialized
        if isinstance(obj, np.ndarray):
            if obj.flags.owndata:
                obj = ndarray_own(obj)
                obj = ndarray_view(obj)
        DeterministicPickler.save(self, obj)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140228/e38ed4de/attachment.html>

More information about the NumPy-Discussion mailing list