Re: [Numpy-discussion] Array interface
Magnus Lie Hetland wrote:
Travis Oliphant <oliphant@ee.byu.edu>:
I don't know if you have followed the array interface discussion. It is defined at http://numeric.scipy.org
This very, very good! The numeric future of Python is looking very bright, IMO :)
Some tiny points:
- Shouldn't the regexp for __array_typestr__ be '[<>]?[tbiufcOSUV][0-9]+'?
Probably. Since, I guess you can only have one of < or > . Thanks..
- What are the semantics when __array_typestr__ isn't V[0-9]+ and __array_descr__ is set? Is __array_typestr__ ignored? Or... What would it be used for?
I would say that the __array_descr__ always gives more information but not every array implementation will support looking at it. For example, current Numeric (24.0 in CVS) ignores __array_descr__ and just looks at the typestr (and doesn't support 'V'). So, I suspect that another array package that knows this may choose something else besides 'V' if it really wants Numeric to still understand it. Suppose you have a complex short int array with __array_descr__ = 'V8
- Does the description of __array_data__ mean that the discussed bytes type is no longer needed? (If we can use buffers, that sounds very good to me.)
Bytes is still needed because the buffer object is not very good and we need a good buffer object in Python for lots of other reasons. It would be very useful, for example to be able to allocate memory using the Python bytes object. But, it does mean less pressure to get it to work.
- Why the parentheses around "buffer protocol-satisfying object" in the description of __array_mask__? And why must it be 'b1'? What if I happen to have mask data from a non-array-protocol source, which happens to be, say, b8 (not unreasonable, I think)? Wouldn't it be good to allow any size of these, and just use zero/non-zero as the criterion? Some of the point of this protocol is to avoid copying and using the original data, after all...? (Same goes for the requirement that it be C-contiguous. I guess I'm basically saying that perhaps __array_mask__ should be an array itself. Or, at least, that it could be *allowed* to be...)
I added the mask late last night. It is probably the least thought out portion. Everything else has been through the ringer a couple more times. My whole thinking is that I just didn't want to explode the protocol with another special name for the mask type. But, saying that the mask object itself can support the array interface doesn't do that, so I think that is a good call. Last night, using the numarray exporter interface and the Numeric consumer interface I was able to share data between a Numeric array and numarray array with no copying of the data buffers. It was very nice. -Travis
Probably. Since, I guess you can only have one of < or > . Thanks..
- What are the semantics when __array_typestr__ isn't V[0-9]+ and __array_descr__ is set? Is __array_typestr__ ignored? Or... What would it be used for?
I would say that the __array_descr__ always gives more information but not every array implementation will support looking at it. For example, current Numeric (24.0 in CVS) ignores __array_descr__ and just looks at the typestr (and doesn't support 'V'). So, I suspect that another array package that knows this may choose something else besides 'V' if it really wants Numeric to still understand it. Suppose you have a complex short int array with
__array_descr__ = 'V8
Let me finish this example: Suppose you have a complex short int array with __array_descr__ = [('real','i2'),('imag','i2')] you could describe this as __array_typestr__ = 'V4' or think of it as a 4 byte integer if you want to make sure that another array package that may not support void pointers can still manipulate the data, and so the creator of the complex short int array may decide that __array_typestr__ = 'i4' is the right thing to do for packages that ignore the __array_descr__ attribute. -Travis
Travis Oliphant <oliphant@ee.byu.edu>:
[snip]
Let me finish this example:
Suppose you have a complex short int array with
__array_descr__ = [('real','i2'),('imag','i2')]
you could describe this as
__array_typestr__ = 'V4'
Sure -- I can see how using 'V' makes sense... You're just telling the host program how many bytes you've got, and that's it. That makes sense to me. What I wondered about was what happened when you use a more specific (and conflicting) type for the typestr...
or think of it as a 4 byte integer if you want to make sure that another array package that may not support void pointers can still manipulate the data, and so the creator of the complex short int array may decide that
__array_typestr__ = 'i4'
This is basically what I'm wondering about. It would make sense (to me) to say that the data type was 'V4', because that's simply less specific, in a way. But saying 'i4' is just as specific as the complex example, above -- but it means something else! You're basically giving the program permission to interpret a four-byte complex number as a four-byte integer, aren't you? Sounds almost like a recipe for disaster to me :} On the other hand -- there is no complex integer type in the interface, and using 'c4' probably would be completely wrong as well. I would almost be tempted to say that if __array_descr__ is in use, __array_typestr__ *has* to use the 'V' type. (Or, one could make some more complicated rules, perhaps, in order to allow other types.) As for not supporting the 'V' type -- would that really be considered a conforming implementation? According to the spec, "Objects wishing to support an N-dimensional array in application code should look for these attributes and use the information provided appropriately". The typestr is required, so... Perhaps the spec should be explicit about the shoulds/musts/mays of the specific typecodes? What must be supported, what may be supported etc.? Or perhaps that doesn't make sense? It just seems almost too bad that one package would have to know what another package supports in order to formulate its own typestr... It sort of throws part of the interoperability out the window.
is the right thing to do for packages that ignore the __array_descr__ attribute.
-Travis
-- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb]
Travis Oliphant <oliphant@ee.byu.edu>:
[snip]
Last night, using the numarray exporter interface and the Numeric consumer interface I was able to share data between a Numeric array and numarray array with no copying of the data buffers. It was very nice.
Wow -- a historic moment :) Now, if we can only get the stdlib's array module to support this protocol (and sprout some more dimensions), as you mentioned... That would really be cool. -- Magnus Lie Hetland Fall seven times, stand up eight http://hetland.org [Japanese proverb]
participants (2)
-
Magnus Lie Hetland -
Travis Oliphant