[Cython] Be more forgiving about memoryview strides

Robert Bradshaw robertwb at gmail.com
Fri Mar 1 21:17:27 CET 2013


On Fri, Mar 1, 2013 at 7:56 AM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
> On Thu, 2013-02-28 at 23:25 -0800, Robert Bradshaw wrote:
>> On Thu, Feb 28, 2013 at 11:12 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> > On Thu, Feb 28, 2013 at 5:50 PM, Robert Bradshaw <robertwb at gmail.com> wrote:
>> >> On Thu, Feb 28, 2013 at 7:13 AM, Sebastian Berg
>> >> <sebastian at sipsolutions.net> wrote:
>> >>> Hey,
>> >>>
>> >>> Maybe someone here already saw it (I don't have a track account, or I
>> >>> would just create a ticket), but it would be nice if Cython was more
>> >>> forgiving about contiguous requirements on strides. In the future this
>> >>> would make it easier for numpy to go forward with changing the
>> >>> contiguous flags to be more reasonable for its purpose, and second also
>> >>> to allow old (and maybe for the moment remaining) corner cases in numpy
>> >>> to slip past (as well as possibly the same for other programs...). An
>> >>> example is (see also https://github.com/numpy/numpy/issues/2956 and the
>> >>> PR linked there for more details):
>> >>>
>> >>> def add_one(array):
>> >>>     cdef double[::1] a = array
>> >>>     a[0] += 1.
>> >>>     return array
>> >>>
>> >>> giving:
>> >>>
>> >>>>>> add_one(np.ascontiguousarray(np.arange(10.)[::100]))
>> >>> ValueError: Buffer and memoryview are not contiguous in the same
>> >>> dimension.
>> >>>
>> >>> This could easily be changed if MemoryViews check the strides as "can be
>> >>> interpreted as contiguous". That means that if shape[i] == 1, then
>> >>> strides[i] are arbitrary (you can just change them if you like). This is
>> >>> also the case for 0-sized arrays, which are arguably always contiguous,
>> >>> no matter their strides are!
>> >>
>> >> I was under the impression that the primary value for contiguous is
>> >> that it a foo[::1] can be interpreted as a foo*. Letting strides be
>> >> arbitrary completely breaks this, right?
>> >
>> > Nope. The natural definition of "C contiguous" is "the array entries
>> > are arranged in memory in the same way they would be if they were a
>> > multidimensional C array" (i.e., what you said.) But it turns out that
>> > this is *not* the definition that numpy and cython use!
>> >
>> > The issue is that the above definition is a constraint on the actual
>> > locations of items in memory, i.e., given a shape, it tells you that
>> > for every index,
>> >  (a)  sum(index * strides) == sum(index * cumprod(shape[::-1])[::-1] * itemsize)
>> > Obviously this equality holds if
>> >  (b)  strides == cumprod(shape[::-1])[::-1] * itemsize
>> > (Or for F-contiguity, we have
>> >  (b')  strides == cumprod(shape) * itemsize
>> > )
>> >
>> > (a) is the natural definition of "C contiguous". (b) is the definition
>> > of "C contiguous" used by numpy and cython. (b) implies (a). But (a)
>> > does not imply (b), i.e., there are arrays that are C-contiguous which
>> > numpy and cython think are discontiguous. (Also in numpy there are
>> > some weird cases where numpy accidentally uses the correct definition,
>> > I think, which is the point of Sebastian's example.)
>> >
>> > In particular, if shape[i] == 1, then the value of stride[i] really
>> > should be irrelevant to judging contiguity, because the only thing you
>> > can do with strides[i] is multiply it by index[i], and if shape[i] ==
>> > 1 then index[i] is always 0. So an array of int8's with shape = (10,
>> > 1), strides = (1, 73) is contiguous according to (a), but not
>> > according to (b). Also if shape[i] is 0 for any i, then the entire
>> > contents of the strides array becomes irrelevant to judging
>> > contiguity; all zero-sized arrays are contiguous according to (a), but
>> > not (b).
>>
>> Thanks for clarifying.
>>
>> Yes, I think it makes a lot of sense to loosen our definition for
>> Cython. Internally, I think the only way we use this assumption is in
>> not requiring that the first/final index be multiplied by the stride,
>> which should be totally fine. But this merits closer inspection as
>> there may be something else.
>
> The only problem I saw was code that used strides[-1] instead of the
> itemsize (e.g. using strides[i]/strides[-1] to then index the typed
> buffer instead of using strides[i]/itemsize). But that should be easy to
> check, numpy had two or so cases of that itself...

I'd be surprised if we do that, but the only way to be sure would be
to look at the code.

>> > (This is really annoying for numpy because given, say, a column vector
>> > with shape (n, 1), it is impossible to be both C- and F-contiguous
>> > according to the (b)-style definition. But people expect expect
>> > various operations to preserve C versus F contiguity, so there are
>> > heuristics in numpy that try to guess whether various result arrays
>> > should pretend to be C- or F-contiguous, and we don't even have a
>> > consistent idea of what it would mean for this code to be working
>> > correctly, never mind test it and keep it working. OTOH if we just fix
>> > numpy to use the (a) definition, then it turns out a bunch of
>> > third-party code breaks, like, for example, cython.)
>>
>> Can you give some examples?
>>
>
> Not sure for what :).

I meant examples of possible breakage.

> Maybe this is an example:
>
> In [1]: a = np.asmatrix(np.arange(9).reshape(3,3).T)
>
> In [2]: a.flags.f_contiguous
> Out[2]: True
>
> In [3]: a[:,0].flags
> Out[3]:
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>   ...
>
> Where that view could just as well be F-contiguous, and the fact that
> numpy, when in doubt, prefers C-contiguous might be surprising. And
> since it would be less strict to begin with, numpy may safe a copy here
> or there (without adding weird stride fixing code).
>
> Examples for code breakage would be this check as well as scikit-learn
> and scipy in 3 or 4 cases making the assumption above of itemsize ==
> strides[-1] for c-contiguous arrays.

Ah.

So, assuming Cython itself isn't making such assumptions, what support
do you want from Cython? I can see (1) accepting as c/f contiguous
arrays that meet this looser definition and (2) setting these flags in
memoryviews we produce under this looser definition. Is there anything
else?

- Robert


More information about the cython-devel mailing list