--- Travis Oliphant <oliphant@ee.byu.edu> wrote:
Thank you for your detailed explanations. This is starting to make more sense to me. It is obvious that you understand what we are trying to do, and I pretty much agree with you in how you think it should be done. I think you do a great job of explaining things.
I agree we should come up with a set of names for the interface to arrayobjects. I'm even convinced that offset should be an optional part of the interface (implied 0 if it's not there).
Very cool! You just made my day. I wish I had time to do a good writeup, but I need to catch a flight in a couple hours, and I won't be back behind my computer until Wednesday night. Here is an initial stab: __array_shape__ Required, a sequence (typically tuple) of non-negative int/longs __array_storage__ Required, a buffer or possibly sequence object (list) (Required unless the object support PyBufferProcs directly? I don't have a strong opinion on that one...) A slightly different name to indicate it could be a buffer or sequence object (like a list). Typically buffer. __array_itemtype__ Suggested, but Optional if __array_itemsize__ is present. This attribute probably warrants some discussion... A struct module format string or one of the additional ones that needs to be added. Need to discuss "long double" and "Object". (Capital 'O' for Object, Captial 'D' for long double, Capital 'X' for bit?) If not present or the empty string '', indicates that the array elements can only be treated as blobs and the real data representation must be gotten from some other means. I think doubling the typecode as a convention to denote complex numbers makes some sense (for instance 'ff' is complex float). The struct module convention for denoting native, portable big endian, and portable little endian is concise and documented. __array_itemsize__ Optional if __array_itemtype is present and the value can calculated from struct.calcsize(__array_itemtype__) __array_strides__ Optional if the array data is in a contiguous C layout. Required otherwise. Same length as __array_shape__. Indicates how much to multiply subscripts by to get to the desired position in the storage. A sequence (typically tuple) of ints/longs. These are in byte offsets (not element_size offsets) for most arrays. Special exceptions made for: Tightly packed (8 bits to a byte) bitmask arrays, where they offsets are bit indexes PyObject arrays (lists) where the offsets are indexes They should be byte offsets to handle non-aligned data or data with odd packing. Fortran arrays might be common enough to warrant special casing. We could discuss whether a __array_fortran__ attribute indicates that the array is in contiguous Fortran layout __array_offset__ Optional and defaults to zero. An int/long indicating the offset to treat as the zeroth element __array_complicated__ Optional and defaults to zero/false. This is a kluge to indicate that while yes the data is an array, the storage layout can not be easily described by the shape/strides/offset combination alone. This could warrant some discussion. __array_fortran__ Optional and defaults to zero/false. If you want to represent Fortran arrays without creating a strides for them, this would be necessary. I'd vote to leave it out and stick with strides... These are all just suggestions. Is something important missing? Predicates like iscontiguous(a) and isfortran(a) can all be easily determined from the above. The ndims or rank is simply len(a.__array_shape__). I wish I had more time to respond to some of the other things in your message, but I'm gone until Wednesday night... Cheers, -Scott