[Numpy-discussion] Questions about the array interface.

Thu Apr 7 21:07:02 EDT 2005

--- Tim Hochberg <tim.hochberg at cox.net> wrote:
>
> I think there is a trade off, but not the one that Chris is worried 
> about. It should be easy to hide complexity of dealing with missing 
> attributes through the various helper functions. The cost will be in 
> speed and will probably be most noticable in C extensions using small 
> arrays where the extra code to check if an attribute is present will be 
> signifigant.
> 
> How signifigant this will be, I'm not sure. And frankly I don't care all 
> that much since I generally only use large arrays. However, since one of 
> the big faultlines between Numarray and Numeric involves the former's 
> relatively poor small array performance, I suspect someone might care.
> 

You must check the return value of the PyObject_GetAttr (or
PyObject_GetAttrString) calls regardless.  Otherwise the extension will die
with an ugly segfault the first time one passes an float where an array was
expected.

If we're talking about small light-weight arrays and a C/C++ function that
wants to work with them very efficiently, I'm not convinced that requiring
the attributes be present will make things faster.

As we're talking about small light weight arrays, it's unlikely the
individual arrays will have __array_shape__ or __array_strides__ already
stored as tuples.  They'll probably store them as a C array as part of
their PyObject structure.

In the world where some of these attributes are optional:  If an attribute
like __array_offset__ or __array_shape__ isn't present, the C code will
know to use zero or the default C-contiguous layout.  So the check failed,
but the failure case is probably very fast (since a temporary tuple object
doesn't have to be built by the array on the fly).

In the world where all of the attributes are required:  The array object
will have to generate the __array_offset__ int/long or __array_shape___
tuple from it's own internal representation.  Then the C/C++ consumer code
will bust apart the tuple to get the values.  So the check succeeded, but
the success code needs to grab the parts of the tuple.

The C helper code could look like:

    struct PyNDArrayInfo {
        int ndims;
        int endian;
        char itemcode;
        size_t itemsize;
        Py_LONG_LONG shape[40]; /* assume 40 is the max for now... */
        Py_LONG_LONG offset;
        Py_LONG_LONG strides[40];
        /* More Array Info goes here */
    };

    int PyNDArray_GetInfo(PyObject* obj, PyNDArrayInfo* info) {
        PyObject* shape;
        PyObject* offset;
        PyObject* strides;
        int ii, len;

        info->itemsize = too_long_for_this_example(obj);

        shape = PyObject_GetAttrString(obj, "__array_shape__");
        if (!shape) return 0;
        len = PySequence_Size(shape);
        if (len < 0) return 0;
        if (len > 40) return 0; /* This needs work */
        info->ndims = len;
        for (ii = 0; ii<len; ii++) {
            PyObject* val = PySequence_GetItem(shape, ii);
            info->shape[ii] = PyLong_AsLongLong(val);
            Py_DECREF(val);
        }
        Py_DECREF(shape);

        offset = PyObject_GetAttrString(obj, "__array_offset__");
        if (offset) {
            /*** THIS PART MIGHT BE SLOWER WHEN IT SUCCEEDS ***/
            info->offset = PyLong_AsLongLong(offset);
            Py_DECREF(offset);
        } else {
            PyErr_Clear();
            info->offset = 0;
        }

        strides = PyObject_GetAttrString(obj, "__array_strides__");
        if (strides) {
            /*** THIS PART IS ALMOST CERTAINLY SLOWER ***/
            for (ii = 0; ii<ndims; ii++) {
                PyObject* val = PySequence_GetItem(strides, ii);
                info->strides[ii] = PyLong_AsLongLong(val);
                Py_DECREF(val);
            }
            Py_DECREF(strides);
        } else {
            /*** THIS FAILURE PATH IS PROBABLY FASTER ***/
            size_t size = info->size;
            PyErr_Clear();
            for (ii = ndims-1; ii>=0; ii--) {
                info->strides[ii] = size;
                size *= info->shape[ii];
            }
        }

        /* More code goes here */
    }

I have no idea how expensive PyErr_Clear() is.  We'd have to profile it to
see for certain.  If PyErr_Clear() is not expensive, then we could make a
strong argument that *not* requiring the attributes will be more efficient.

It could also be so close that it doesn't matter - in which case it's back
to being a matter of taste...

Cheers,
    -Scott