[Numpy-discussion] Array Protocol change for Python 2.6

Tim Hochberg tim.hochberg at cox.net
Fri Jun 9 23:58:50 EDT 2006


David M. Cooke wrote:

>On Fri, 09 Jun 2006 16:03:32 -0700
>Andrew Straw <strawman at astraw.com> wrote:
>
>  
>
>>Tim Hochberg wrote:
>>
>>    
>>
>>>Which of the following should we require for an object to be "supporting 
>>>the array interface"? Here a producer is something that supplies 
>>>array_struct or array_interface (where the latter is the Python level 
>>>version of the former as per recent messages). Consumers do something 
>>>with the results.
>>>
>>>  1. Producers can supply either array_struct (if implemented in C) or
>>>     array_interface (if implemented in Python). Consumers must accept
>>>     both.
>>>  2. Producers must supply both array_struct and array_interface.
>>>     Consumers may accept either.
>>>  3. Producers most supply both array_struct and array_interface.
>>>     Consumers must accept both as well.
>>> 
>>>
>>>      
>>>
>>I haven't been following as closely as I could, but is the following a 
>>possibility?
>>    4. Producers can supply either array_struct or array_interface. 
>>Consumers may accept either. The intermediate is a small, standalone 
>>(does not depend on NumPy) extension module that does automatic 
>>translation if necessary by provides 2 functions: as_array_struct() 
>>(which returns a CObject) and as_array_interface() (which returns a 
>>tuple/dict/whatever).
>>    
>>
>
>For something to go in the Python standard library this is certainly
>possible. Heck, if it's in the standard library we can have one attribute
>which is a special ArrayInterface object, which can be queried from both
>Python and C efficiently.
>
>For something like numpy (where we don't require a special object: the
>"producer" and "consumers" in Tim's terminology could be Numeric and
>numarray, for instance), we don't want a 3rd-party dependence. There's one
>case that I mentioned in another email:
>
>5. Producers must supply array_interface, and may supply array_struct.
>Consumers can use either.
>
>Requiring array_struct means that Python-only modules can't play along, so I
>think it should be optional (of course, if you're concerned about speed, you
>would provide it).
>
>Or maybe we should revisit the "no external dependencies". Perhaps one module
>would make everything easier, with helper functions and consistent handling
>of special cases. Packages wouldn't need it if they don't interact: you could
>conditionally import it when __array_interface__ is requested, and fail if
>you don't have it. It would just be required if you want to do sharing.
>  
>
Here's another idea: move array_struct *into* array_interface. That is, 
array_interface becomes a dictionary with the following items:

    shape : sequence specifying the shape
    typestr : the typestring
    descr: you get the idea
    strides: ...
    shape: ...
    mask: ...
    offset: ...
    data: A buffer object
    struct: the array_struct or None.

The downside is that you have to do two lookups to get the array_struct, 
and that should be the fast path. A partial solution is to instead have 
array_interface be a super_tuple similar to the result of os.stat. This 
should be faster since tuple is quite fast to index if you know what 
index you want.

An advantage of having one module that you need to import is that we 
could use something other than CObject, which would allow us to bullet 
proof the array interface at the python level. One nit with using a 
CObject is that I can pass an object that doesn't refer to a 
PyArrayInterface with unpleasant results.

-tim









More information about the NumPy-Discussion mailing list