[Numpy-discussion] size_t or npy_intp?

Kurt Smith kwmsmith at gmail.com
Tue Jul 27 11:11:46 EDT 2010


On Tue, Jul 27, 2010 at 9:45 AM, Francesc Alted <faltet at pytables.org> wrote:
> A Tuesday 27 July 2010 15:20:47 Charles R Harris escrigué:
>> On Tue, Jul 27, 2010 at 7:08 AM, Francesc Alted <faltet at pytables.org> wrote:
>> > Hi,
>> >
>> > I'm a bit confused on which datatype should I use when referring to NumPy
>> > ndarray lengths.  In one hand I'd use `size_t` that is the canonical way
>> > to refer to lengths of memory blocks.  In the other hand, `npy_intp`
>> > seems the standard data type used in NumPy for this.
>>
>> They have different ranges, npy_intp is signed and in later versions of
>> Python is the same as Py_ssize_t, while size_t is unsigned. It would be a
>> bad idea to mix the two.
>
> Agreed that mixing the two is a bad idea.  So I suppose that you are
> suggesting to use `npy_intp`.  But then, I'd say that `size_t` being unsigned,
> is a better fit for describing a memory length.
>
> Mmh, I'll stick with `size_t` for the time being (unless anyone else can
> convince me that this is really a big mistake ;-)

This would be good to clear up; I've been confused on the issue myself
for my project.  The PyArrayObject struct is defined using
`npy_intp`s:

typedef struct PyArrayObject {
        PyObject_HEAD
        char *data;             /* pointer to raw data buffer */
        int nd;                 /* number of dimensions, also called
ndim */
        npy_intp *dimensions;       /* size in each dimension */
        npy_intp *strides;          /* bytes to jump to get to the
                                   next element in each dimension */
        PyObject *base;         /* This object should be decref'd
                                   upon deletion of array */
                                /* For views it points to the original
array */
                                /* For creation from buffer object it
points
                                   to an object that shold be decref'd
on
                                   deletion */
                                /* For UPDATEIFCOPY flag this is an
array
                                   to-be-updated upon deletion of this
one */
        PyArray_Descr *descr;   /* Pointer to type structure */
        int flags;              /* Flags describing array -- see
below*/
        PyObject *weakreflist;  /* For weakreferences */
} PyArrayObject;

(numpy 1.4.1, numpy/core/include/numpy/ndarrayobject.h)

And because of that, Cython's numpy functionality uses `npy_intp`
everywhere.  Perhaps this is required for backwards compat. in numpy,
but in an ideal world, should those be `npy_uintp`s?

Looking at the bufferinfo struct for the buffer protocol, it uses `Py_ssize_t`:

struct bufferinfo {
     void *buf;
     Py_ssize_t len;
     int readonly;
     const char *format;
     int ndim;
     Py_ssize_t *shape;
     Py_ssize_t *strides;
     Py_ssize_t *suboffsets;
     Py_ssize_t itemsize;
     void *internal;
} Py_buffer;

So everyone is using signed values where it would make more sense (to
me at least) to use unsigned.  Any reason for this?

I'm using `npy_intp` since Cython does it that way :-)

Kurt



More information about the NumPy-Discussion mailing list