[Numpy-discussion] Questions about the array interface.

Thu Apr 7 14:13:32 EDT 2005

--- Chris Barker <Chris.Barker at noaa.gov> wrote:
> 
> Again, I'm uncomfortable with something that I have to check being 
> optional. If it is, we're encouraging people to not check it, and that' 
> a recipe for bugs later on down the road.
> 
[snip]
>
> > I guess all I'm saying is that I wouldn't assume the offset is zero...
> 
> Good point. All the more reason to have the offset be mandatory.
>

Lot's of protocols have optional parts.

The helper functions would hide this level of detail.

> 
> Yes, if there is a C/C++ version of all these helper functions, I'll be 
> a lot happier. And you're right, the same information should not be 
> encoded in two places, so my "iscontiguous" attribute should be a helper 
> function or maybe a method.
> 
>  > In a short while, you shouldn't have to check any __array_metadata__
>  > attributes directly.  There should even be a helper function for
>  > getting the array elements.
> 
> Cool. How would that work? A C++ iterator? I"m thinking not, as this is 
> all C, no?
> 

I think this will take shape as an include file with static/inline
functions.  No linking required, just #include <ndarray.h> and call the
functions.  It would be nice but not necessary that this was distributed
with Python.

I would be in favor of having some C++ iterator interfaces (possibly a
template class) inside of a #ifdef __cplusplus block.  Python doesn't seem
to have a a lot C++ in the core so I wonder if this would meet resistance
(even when it's inside of a #ifdef block).

>
>  > It wouldn't be a horrible mistake to have all the attributes be 
>  > mandatory, but it doesn't get array consumes any benefit that they
>  > can't get from a well written helper library, and it does add some
>  > burden to array producers.
> 
> Hardly any. I'm assuming that there will be a base_array class that can 
> be used as a base class or mixin, so it wouldn't be any work at all to 
> have a full set of attributes with defaults. It would take up a little 
> bit of memory. I'm assuming that the whole point of this is to support 
> large datasets, but maybe that isn't a valid assumption, After all, 
> small array support has turned out to be very important for Numeric.
> 

If the protocol can make things easy without the use of a mixin or base
class, all the better to my way of thinking.  I don't think the memory use
is very relevant as the attributes would only require storage in the class
object, not the instances.

There is something elegant about making array creation as easy as:

    class easy_array:
        def __init__(self, filename):
            data = open(filename, 'r').read()
            self.__array_data__ = data
            self.__array_shape__ = (len(data)/4,)
            self.__array_typestr__ = '>i4'

Like I said, I don't think it would be *horrible* to require all the
attributes, but I don't see how it will benefit you at all.  And even if
all the attributes are mandatory, there are still a number of details to
get right in reading the memory.  You'll likely want to use the helper
libraries/modules regardless.  (Once they're completed of course...)

>
> As a rule of thumb, I think there will be [more] consumers of arrays
> than producers, so I'd rather make it easy on the consumers that the 
> producers, if we need to make such a trade off. Maybe I'm biased, 
> because I'm a consumer.
>

I don't see the trade off.  It will be easy for you either way, but harder
for array producers (admittedly only a little).

This has to be easier than the situation you have today right?  Imagine the
code you'd have to write to special case Numeric, scipy.base, Numarray, and
Python's array module.

Cheers,
    -Scott