[Numpy-discussion] Questions about the array interface.

Fri Apr 8 00:43:08 EDT 2005

On Thu, Apr 07, 2005 at 11:55:01PM -0700, Tim Hochberg wrote:
> Scott Gilbert wrote:
> 
> >--- Tim Hochberg <tim.hochberg at cox.net> wrote:
> > 
> >
> >>I think there is a trade off, but not the one that Chris is worried 
> >>about. It should be easy to hide complexity of dealing with missing 
> >>attributes through the various helper functions. The cost will be in 
> >>speed and will probably be most noticable in C extensions using small 
> >>arrays where the extra code to check if an attribute is present will be 
> >>signifigant.
> >>
> >>How signifigant this will be, I'm not sure. And frankly I don't care all 
> >>that much since I generally only use large arrays. However, since one of 
> >>the big faultlines between Numarray and Numeric involves the former's 
> >>relatively poor small array performance, I suspect someone might care.
> >>
> >>   
> >>
> >
> >You must check the return value of the PyObject_GetAttr (or
> >PyObject_GetAttrString) calls regardless.  Otherwise the extension will die
> >with an ugly segfault the first time one passes an float where an array was
> >expected.
> >
> >If we're talking about small light-weight arrays and a C/C++ function that
> >wants to work with them very efficiently, I'm not convinced that requiring
> >the attributes be present will make things faster.
> >
> >
> >As we're talking about small light weight arrays, it's unlikely the
> >individual arrays will have __array_shape__ or __array_strides__ already
> >stored as tuples.  They'll probably store them as a C array as part of
> >their PyObject structure.
> >
> >
> >In the world where some of these attributes are optional:  If an attribute
> >like __array_offset__ or __array_shape__ isn't present, the C code will
> >know to use zero or the default C-contiguous layout.  So the check failed,
> >but the failure case is probably very fast (since a temporary tuple object
> >doesn't have to be built by the array on the fly).
> >
> >In the world where all of the attributes are required:  The array object
> >will have to generate the __array_offset__ int/long or __array_shape___
> >tuple from it's own internal representation.  Then the C/C++ consumer code
> >will bust apart the tuple to get the values.  So the check succeeded, but
> >the success code needs to grab the parts of the tuple.
> >
> >The C helper code could look like:
> 
> I'm not convinced it's legit to assume that a failure to get the 
> attribute means that it's not present and call PyErrorClear. Just as a 
> for instance, what if the attribute in question is implemented as a 
> descriptor in which there is some internal error. Then your burying the 
> error and most likely doing the wrong thing. As far as I can tell, the 
> only correct way to do this is to use PyObject_HasAttrString, then 
> PyObject_GetAttrString if that succeeds.

No point: PyObject_HasAttrString *calls* PyObject_GetAttrString, then
clears the error if there is one.

[Side note: hasattr() in Python works the same way, which is why using
properties is a pain when you've got code that's using it]

> The point about not passing around the tuples probably being faster is a 
> good one. Another thought is that requiring tuples instead of general 
> sequences would make the helper faster (since one could use 
> *PyTuple_GET_**ITEM*, which I believe is much faster than 
> PySequence_GetItem). This would possibly shift more pain onto the 
> implementer of the object though. I suspect that the best strategy, 
> orthogonal to requiring all attributes or not, is to use PySequence_Fast 
> to get a fast sequence and work with that. This means that objects that 
> return tuples for strides, etc would run at maximum possible speed, 
> while other sequences would still work.

How about objects that use a lightweight array as the strides sequence?
I'm thinking that if you've got a fast 1-d array object, you'd be
tempted to use an instance of that as the shape or strides attribute.
You'd be saving on temporary tuple creation (but you'd still be losing
some in making Python ints).

I haven't benchmarked it, but I'm looking at the code for
PySequence_GetItem(): it does a few pointer derefences to get the
sq_item() method in the tp_as_sequence struct of an object implementing
the sequence protocol, which for the tuple does an array indexing of the
tuple's data. You've got about two function calls more compared
to using PyTuple_GET_ITEM.

It really depends on how big the arrays you expect to get passed to you.
If they're big, this is all amortized: you'll hardly see it.
It also depends on how your routines get used. If the routine is buried
below a few layers of API, you'd likely be better off doing a typecast
higher up to your own representation, or something. If it's at the
border, so the user will call it directly *often*, you're going to be
screwed for speed anyways (giving the user the option of casting arrays
to something else would probably help a lot here also).

> Back to requiring attributes or not. I suspect that the fastest correct 
> way is to require all attributes, but allow them to be None, in which 
> case the default value is used. Then any errors are easily bubbled up 
> and a fast check for None choses whether to use the defaults or not.
> 
> It's late, so I hope that's not too incoherent. Or too wrong.
> 
> Oh, one other nitpicky thing, I think PyLong_AsLongLong needs some sort 
> of error checking (it can allegedly raise errors). I suppose that means 
> one is supposed to call PyError_Occurred after every call? That's sort 
> of painful!

Yes! Check all C API functions that may return errors! That includes
PySequence_GetItem() and PyLong_AsLongLong.

> >   struct PyNDArrayInfo {
> >       int ndims;
> >       int endian;
> >       char itemcode;
> >       size_t itemsize;
> >       Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */
> >       Py_LONG_LONG offset;
> >       Py_LONG_LONG strides[40];
> >       /* More Array Info goes here */
> >   };
> >
> >   int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) {
> >       PyObject* shape;
> >       PyObject* offset;
> >       PyObject* strides;
> >       int ii, len;
> >
> >       info->itemsize = too_long_for_this_example(obj);
> >
> >       shape = PyObject_GetAttrString(obj, "__array_shape__");
> >       if (!shape) return 0;
> >       len = PySequence_Size(shape);
> >       if (len < 0) return 0;
> >       if (len > 40) return 0; /* This needs work */
> >       info->ndims = len;
> >       for (ii = 0; ii<len; ii++) {
> >           PyObject* val = PySequence_GetItem(shape, ii);

Like here
> >           info->shape[ii] = PyLong_AsLongLong(val);
and here
> >           Py_DECREF(val);
(if you don't check PySequence_GetItem -- not a good idea anyways --
this should be Py_XDECREF)

[snip more code that needs checks :-)]

> >I have no idea how expensive PyErr_Clear() is.  We'd have to profile it to
> >see for certain.  If PyErr_Clear() is not expensive, then we could make a
> >strong argument that *not* requiring the attributes will be more efficient.

Not much; it's about three Py_XDECREF's.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca