[Numpy-discussion] Questions about the array interface.

David M. Cooke cookedm at physics.mcmaster.ca
Thu Apr 7 00:55:37 EDT 2005


"Chris Barker" <Chris.Barker at noaa.gov> writes:

> Travis Oliphant wrote:
>
>> You should account for the '<' or '>' that might be present in
>> __array_typestr__   (Numeric won't put it there, but scipy.base and
>> numarray will---since they can have byteswapped arrays internally).
>
> Good point, but a pain. Maybe they should be required, that way I
> don't have to first check for the presence of '<' or '>', then check
> if they have the right value.

I'll second this. Pulling out more Python Zen: Explicit is better than implicit.

>> A more generic interface would handle multiple integer types if
>> possible
>
> I'd like to support doubles as well...
>
>> (but this is a good start...)
>
> Right. I want to get _something_ working, before I try to make it universal!
>
>> I think one idea here is that if __array_strides__ returns None,
>> then C-style contiguousness is assumed.   In fact, I like that idea
>> so much that I just changed the interface.  Thanks for the
>> suggestion.
>
> You're welcome. I like that too.
>
>> No, they won't always be there for SciPy arrays (currently 4 of them
>> are).  Only record-arrays will provide __array_descr__ for example
>> and __array_offset__ is unnecessary for SciPy arrays.  I actually
>> don't much like the __array_offset__  parameter myself, but Scott
>> convinced me that it would could be useful for very complicated
>> array classes.
>
> I can see that it would, but then, we're stuck with checking for all
> these optional attributes. If I don't bother to check for it, one day,
> someone is going to pass a weird array in with an offset, and a
> strange bug will show up.

Here's a summary:

Attributes           required by            required
                     array-like object      to be checked
__array_shape__           yes                   yes
__array_typestr__         yes                   yes
__array_descr__           no                    no
__array_data__            no                    yes
__array_strides__         no                    yes
__array_mask__            no                    no?
__array_offset__          no                    yes

I'm assuming in "required to be checked" column a user of the array
that's interested in looking at all of the elements, so we have to
consider all possible situations where forgetting to consider an
attribute could lead to invalid memory accesses. __array_strides__ and
__array_offset__ in particular could be troublesome if forgotten.

The __array_mask__ element is difficult: for most applications, you
should check it, and raise an error if exists and is not None, unless
you can handle missing elements. It's certainly not required that all
users of an array object need to understand all array types!

Since we have to check a bunch anyways, I think that's a good enough
reason for having them to exist? There are suitable defaults defined
in the protocol document (__array_strides__ in particular) that make
it easy to add them in simple cases.

>> So, the correct consumer usage for grabbing the data is
>> data = getattr(obj, '__array_data__', obj)
>
> Ah! I hadn't noticed the default parameter to getattr(). That makes it
> much easier. Is there an equivalent in C? It doesn't look like it to
> me, but I'm kind of a newbie with the C API.

You'd want something like

adata = PyObject_GetAttrString(array_obj, "__attr_data__");
if (!adata) {
    /* error */
    PyErr_Clear();
    adata = array_obj;
}

>> int *PyObject_AsReadBuffer*(PyObject *obj, const void **buffer, int
>> *buffer_len)
>
> I'm starting to get this.
>
>> Of course this approach has the 32-bit limit until we get this
>> changed in Python.
>
> That's the least of my worries!
>
>>> 6) Should __array_offset__ be optional? I'd rather it were
>>> required, but  default to zero. This way I have to check for it,
>>> then use it. Also, I assume it is an integer number of bytes, is
>>> that right?
>> A consumer has to check for most of the optional stuff if they want
>> to support all types of arrays.
>
> That's not quite true. I'm happy to support only the simple types of
> arrays (contiguous, single type elements, zero offset(, but I have to
> check all that stuff to make sure that I have a simple array. The
> simplest arrays are the most common case, they should be as easy as
> possible to support.
>
>> Again a simple:
>> getattr(obj, '__array_offset__', 0)
>> works fine.
>
> not too bad.
>
> Also, what if we find the need for another optional attribute later?
> Any older code won't check for it. Or maybe I'm being paranoid....

This is a good point; all good protocols embed a version somewhere.
Not doing it now could lead to grief/pain later.

I'd suggest adding to __array_data__: If __array_data__ is None, then
the array is implementing a newer version of the interface, and you'd
either need to support that (maybe the new version uses
__array_data2__ or something), or use the sequence protocol on the
original object. The sequence protocol should definitely be safe all
the time, whereas the buffer protocol may not. (Put it this way: I
understand the sequence protocol well, but not the buffer one :-)

That would also be a good argument for it existing, I think.

Alternatively, we could add an __array_version__ attribute (required
to exist, required to check) which is set to 1 for this protocol.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca




More information about the NumPy-Discussion mailing list