
Hi, I'm a bit confused on which datatype should I use when referring to NumPy ndarray lengths. In one hand I'd use `size_t` that is the canonical way to refer to lengths of memory blocks. In the other hand, `npy_intp` seems the standard data type used in NumPy for this. Which one would you recommend to use in NumPy extensions? -- Francesc Alted

On Tue, Jul 27, 2010 at 7:08 AM, Francesc Alted <faltet@pytables.org> wrote:
Hi,
I'm a bit confused on which datatype should I use when referring to NumPy ndarray lengths. In one hand I'd use `size_t` that is the canonical way to refer to lengths of memory blocks. In the other hand, `npy_intp` seems the standard data type used in NumPy for this.
They have different ranges, npy_intp is signed and in later versions of Python is the same as Py_ssize_t, while size_t is unsigned. It would be a bad idea to mix the two. Chuck

A Tuesday 27 July 2010 15:20:47 Charles R Harris escrigué:
On Tue, Jul 27, 2010 at 7:08 AM, Francesc Alted <faltet@pytables.org> wrote:
Hi,
I'm a bit confused on which datatype should I use when referring to NumPy ndarray lengths. In one hand I'd use `size_t` that is the canonical way to refer to lengths of memory blocks. In the other hand, `npy_intp` seems the standard data type used in NumPy for this.
They have different ranges, npy_intp is signed and in later versions of Python is the same as Py_ssize_t, while size_t is unsigned. It would be a bad idea to mix the two.
Agreed that mixing the two is a bad idea. So I suppose that you are suggesting to use `npy_intp`. But then, I'd say that `size_t` being unsigned, is a better fit for describing a memory length. Mmh, I'll stick with `size_t` for the time being (unless anyone else can convince me that this is really a big mistake ;-) -- Francesc Alted

Francesc Alted wrote:
A Tuesday 27 July 2010 15:20:47 Charles R Harris escrigué:
On Tue, Jul 27, 2010 at 7:08 AM, Francesc Alted <faltet@pytables.org> wrote:
Hi,
I'm a bit confused on which datatype should I use when referring to NumPy ndarray lengths. In one hand I'd use `size_t` that is the canonical way to refer to lengths of memory blocks. In the other hand, `npy_intp` seems the standard data type used in NumPy for this.
They have different ranges, npy_intp is signed and in later versions of Python is the same as Py_ssize_t, while size_t is unsigned. It would be a bad idea to mix the two.
Agreed that mixing the two is a bad idea. So I suppose that you are suggesting to use `npy_intp`. But then, I'd say that `size_t` being unsigned, is a better fit for describing a memory length.
Mmh, I'll stick with `size_t` for the time being (unless anyone else can convince me that this is really a big mistake ;-)
Well, Python has reasons for using Py_ssize_t (= ssize_t where available) internally for everything that has to do with indexing. (E.g. it wants to use the same type for the strides, which can be negative.) You just can't pass indices to any Python API that doesn't fit in ssize_t. You're free to use size_t in your own code, but if you actually use the extra bit, the moment it hits Python you'll overflow and get garbage...so you need to check every time you hit any Python layer, rather than only in the input to your code. Your choice though. Dag Sverre

On Tue, Jul 27, 2010 at 9:45 AM, Francesc Alted <faltet@pytables.org> wrote:
A Tuesday 27 July 2010 15:20:47 Charles R Harris escrigué:
On Tue, Jul 27, 2010 at 7:08 AM, Francesc Alted <faltet@pytables.org> wrote:
Hi,
I'm a bit confused on which datatype should I use when referring to NumPy ndarray lengths. In one hand I'd use `size_t` that is the canonical way to refer to lengths of memory blocks. In the other hand, `npy_intp` seems the standard data type used in NumPy for this.
They have different ranges, npy_intp is signed and in later versions of Python is the same as Py_ssize_t, while size_t is unsigned. It would be a bad idea to mix the two.
Agreed that mixing the two is a bad idea. So I suppose that you are suggesting to use `npy_intp`. But then, I'd say that `size_t` being unsigned, is a better fit for describing a memory length.
Mmh, I'll stick with `size_t` for the time being (unless anyone else can convince me that this is really a big mistake ;-)
This would be good to clear up; I've been confused on the issue myself for my project. The PyArrayObject struct is defined using `npy_intp`s: typedef struct PyArrayObject { PyObject_HEAD char *data; /* pointer to raw data buffer */ int nd; /* number of dimensions, also called ndim */ npy_intp *dimensions; /* size in each dimension */ npy_intp *strides; /* bytes to jump to get to the next element in each dimension */ PyObject *base; /* This object should be decref'd upon deletion of array */ /* For views it points to the original array */ /* For creation from buffer object it points to an object that shold be decref'd on deletion */ /* For UPDATEIFCOPY flag this is an array to-be-updated upon deletion of this one */ PyArray_Descr *descr; /* Pointer to type structure */ int flags; /* Flags describing array -- see below*/ PyObject *weakreflist; /* For weakreferences */ } PyArrayObject; (numpy 1.4.1, numpy/core/include/numpy/ndarrayobject.h) And because of that, Cython's numpy functionality uses `npy_intp` everywhere. Perhaps this is required for backwards compat. in numpy, but in an ideal world, should those be `npy_uintp`s? Looking at the bufferinfo struct for the buffer protocol, it uses `Py_ssize_t`: struct bufferinfo { void *buf; Py_ssize_t len; int readonly; const char *format; int ndim; Py_ssize_t *shape; Py_ssize_t *strides; Py_ssize_t *suboffsets; Py_ssize_t itemsize; void *internal; } Py_buffer; So everyone is using signed values where it would make more sense (to me at least) to use unsigned. Any reason for this? I'm using `npy_intp` since Cython does it that way :-) Kurt

Kurt Smith wrote:
Looking at the bufferinfo struct for the buffer protocol, it uses `Py_ssize_t`:
struct bufferinfo { void *buf; Py_ssize_t len; int readonly; const char *format; int ndim; Py_ssize_t *shape; Py_ssize_t *strides; Py_ssize_t *suboffsets; Py_ssize_t itemsize; void *internal; } Py_buffer;
So everyone is using signed values where it would make more sense (to me at least) to use unsigned. Any reason for this?
I'm using `npy_intp` since Cython does it that way :-)
And Cython (and NumPy, I expect) does it that way because Python does it that way. And that really can't be changed. The reasons are mostly historical/for convenience. And once 64-bit is more widespread, do we really care about the one bit? From PEP 353: Why not size_t56 <http://www.python.org/dev/peps/pep-0353/#id9> An initial attempt to implement this feature tried to use size_t. It quickly turned out that this cannot work: Python uses negative indices in many places (to indicate counting from the end). Even in places where size_t would be usable, too many reformulations of code where necessary, e.g. in loops like: for(index = length-1; index >= 0; index--) This loop will never terminate if index is changed from int to size_t. Dag Sverre

A Tuesday 27 July 2010 17:17:55 Dag Sverre Seljebotn escrigué:
Kurt Smith wrote:
Looking at the bufferinfo struct for the buffer protocol, it uses `Py_ssize_t`:
struct bufferinfo { void *buf; Py_ssize_t len; int readonly; const char *format; int ndim; Py_ssize_t *shape; Py_ssize_t *strides; Py_ssize_t *suboffsets; Py_ssize_t itemsize; void *internal; } Py_buffer;
So everyone is using signed values where it would make more sense (to me at least) to use unsigned. Any reason for this?
My reason was just being consistent with `malloc(size_t size)` signature (and that the C world seems to widely use `size_t` for sizes).
I'm using `npy_intp` since Cython does it that way :-)
And Cython (and NumPy, I expect) does it that way because Python does it that way. And that really can't be changed.
The reasons are mostly historical/for convenience. And once 64-bit is more widespread, do we really care about the one bit?
From PEP 353:
Why not size_t56 <http://www.python.org/dev/peps/pep-0353/#id9>
An initial attempt to implement this feature tried to use size_t. It quickly turned out that this cannot work: Python uses negative indices in many places (to indicate counting from the end). Even in places where size_t would be usable, too many reformulations of code where necessary, e.g. in loops like:
for(index = length-1; index >= 0; index--)
This loop will never terminate if index is changed from int to size_t.
Ok, I'm not going to break Python/NumPy conventions so you convinced me: I'll use `npy_intp` then. Thanks! -- Francesc Alted

On Tue, Jul 27, 2010 at 10:17 AM, Dag Sverre Seljebotn <dagss@student.matnat.uio.no> wrote:
From PEP 353:
Why not size_t56 <http://www.python.org/dev/peps/pep-0353/#id9>
An initial attempt to implement this feature tried to use size_t. It quickly turned out that this cannot work: Python uses negative indices in many places (to indicate counting from the end). Even in places where size_t would be usable, too many reformulations of code where necessary, e.g. in loops like:
for(index = length-1; index >= 0; index--)
This loop will never terminate if index is changed from int to size_t.
Of course. Makes sense; thanks for the clarification. Kurt
Dag Sverre

On Tue, Jul 27, 2010 at 10:08 PM, Francesc Alted <faltet@pytables.org> wrote:
Hi,
I'm a bit confused on which datatype should I use when referring to NumPy ndarray lengths. In one hand I'd use `size_t` that is the canonical way to refer to lengths of memory blocks. In the other hand, `npy_intp` seems the standard data type used in NumPy for this.
npy_intp is the one to use ATM. I agree it is confusing (because intp_t and ssize_t are for different use cases), adding a npy_ssize_t and fixing the API accordingly is on my TODO list, but that's pretty low :) David
participants (5)
-
Charles R Harris
-
Dag Sverre Seljebotn
-
David Cournapeau
-
Francesc Alted
-
Kurt Smith