[Numpy-discussion] Numpy's definition of contiguous arrays

Sebastian Berg sebastian at sipsolutions.net
Tue Dec 4 12:08:32 EST 2012


Hi,

maybe someone has an opinion about how this can be handled and was not
yet aware of this.

In current numpy master (probably being reverted), the definition for
contiguous arrays is changed such that it means "Contiguous in memory"
and nothing more. What this means is this:

1. An array of size (1,3,1) is both C- and F-contiguous (Assuming
`arr.strides[1] == arr.itemsize`).
2. However it is incorrect that `arr.strides[-1] == arr.itemsize`
because the corresponding axes dimension is 1 so it does not matter for
the memory layout. Also other similar assumptions about "clean strides"
are incorrect. (This was always incorrect in corner cases)

I think most will agree that this change reflects what these flags
should indicate, because the exact value of the strides is not really
important for the memory layout and for example for a row vector there
is no reason to say it cannot be both C- and F-contiguous.

However the change broke some code in scipy as well as sk-learn, that
relied on `arr.strides[-1] == arr.itemsize` (for C-contiguous arrays).
The fact that it was never noticed that this isn't quite correct
indicates that there is certainly more code out there just like it.

There is more discussion here: https://github.com/numpy/numpy/pull/2735
with suggestions for a possible deprecation process of having both
definitions next to each other and deprecating the current, etc.
I was personally wondering if it is good enough to ensure strides are
cleaned up when an array is explicitly requested as contiguous which
means:

    np.array(arr, copy=False, order='C').strides[-1] == arr.itemsize

is always True, but:

    if arr.flags.c_contiguous:
        # It is possible that:
        arr.strides[-1] != arr.itemsize

Which fixes the problems found yet since typically if you want to use
the fact that an array is contiguous, you use this kind of command to
make sure it is. But I guess it is likely too dangerous to assume that
nobody only checks the flags and then continuous to do unwanted
assumptions about strides.

Best Regards,

Sebastian




More information about the NumPy-Discussion mailing list