[Numpy-discussion] Need help for implementing a fast clip in numpy (was slow clip)

David Cournapeau cournape at gmail.com
Sat Jan 13 00:42:36 EST 2007

On 1/13/07, Christopher Barker <Chris.Barker at noaa.gov> wrote:
> I think it may have all been cleared up now, but just in case:
>  >>> a
> array([[[ 0,  1,  2,  3],
>          [ 4,  5,  6,  7],
>          [ 8,  9, 10, 11]],
>         [[12, 13, 14, 15],
>          [16, 17, 18, 19],
>          [20, 21, 22, 23]]])
>  >>> a.flags
>    C_CONTIGUOUS : True
>    F_CONTIGUOUS : False
>    OWNDATA : True
>    WRITEABLE : True
>    ALIGNED : True
> So here is the most common case: Contiguous, aligned, etc. This means
> the data buffer is a "normal C array", and you can iterate through it in
> the normal C way.
>  >>> b = a[:,2,:]
>  >>> b
> array([[ 8,  9, 10, 11],
>         [20, 21, 22, 23]])
> So b is a slice pulled out of a, sharing the same data black, but not
> all of it!
>  >>> b.flags
>    C_CONTIGUOUS : False
>    F_CONTIGUOUS : False
>    OWNDATA : False
>    WRITEABLE : True
>    ALIGNED : True
>  >>>
> So it is not longer a Contiguous array. This means that you are looking
> at the same block of memory as above, but only some of it is used for
> this array, and you need to use the strides to get at the elements that
> are part of this array.
I finally understood that point, which prevents me from understanding
the real issue. I totally forgot about views which are not copies in
numpy (contrary to matlab, where my experience in C extension lies),
which makes for all those case... I kepts thinking how come you could
create a non aligned or a non packed array in numpy, but once you
start thinking about views and everything, it all makes sense...

So basically, in my case, I do not care about F_CONTIGUOUS vs
C_CONTIGUOUS, but I care about contiguity. So my special case is

> This is all aside from alignment issues, of which I know nothing.
> > So the data buffer in numpy is not really a C array, but more like a
> > structure ? (I am  talking about the actual binary data)
> Is is a C array of bytes, and probably of the given type, but not all of
> it is used for this particular array object, or, in the case of Fortran
> ordering, it's used in a different order that a conventional C n-d array.
> > "for a numpy array a of eg float32, am I guaranteed that
> > a->data[sizeof(float32) * i] for 0 <= i < a.size gives me all the items
> > of a, even for non contiguous arrays ?"
> I think you got the answer to this, but in case you didn't:
> No, only if it's contiguous (I'm not sure about alignment)
According to numpy ebook, you need alignement for deferencing
pointers. But I guess non aligned arrays are not really common.

> >> First, it's not that important if the array is contiguous for this
> >> > sort of thing. What you really care about is whether it's
> >> > simply-strided (or maybe single-strided would be a better term)
> but how do you get single strided? this is what always made it hard for
> me to know how to write this kind of code.
I am also interested in that. You could imagine that some arrays are
neither C or F contiguous, but still are "packed" (using some
concatenatation, for example ?), and as such can be iterated easily.

> This involves a fair bit of math for each index operation -- it could
> really slow down a simple function like clip.

Indeed. But again, slower operations on those arrays is to be
expected, and speeding them up is not all that important. At least, it
is not important for the case I want to speed up.

> Is this iterator as efficient as incrementing the index for contiguous
> arrays? i.e. is there any point of special casing contiguous arrays if
> you use it?

I think the problem is not that much incrementing the iterator, but
more having contiguous memory access. For example, the iterator may
not guarantee traversing the array "contiguously" even if the array is
contiguous (I didn't look at the sources for the array iterator). If
for example you access a C order array but iterating the F way, it
will be much slower, because your computation becomes limited by main
memory's speed.

For some code related to linear prediction coding, I tried to take
into account C_CONTIGUOUS vs other cases, and I realised that in most
cases, it is not only easier but often faster to copy the data in a
new contiguous data buffer and doing the computation on that.
Basically, at least for those cases, it didn't worth the effort to
handle complicated cases (the memory buffer being not that big).


More information about the NumPy-Discussion mailing list