[Numpy-discussion] Re: Bytes Object and Metadata
Scott Gilbert
xscottg at yahoo.com
Mon Mar 28 10:30:22 EST 2005
--- Travis Oliphant <oliphant at ee.byu.edu> wrote:
>
> Thank you for your detailed explanations. This is starting to make more
> sense to me. It is obvious that you understand what we are trying to
> do, and I pretty much agree with you in how you think it should be
> done. I think you do a great job of explaining things.
>
> I agree we should come up with a set of names for the interface to
> arrayobjects. I'm even convinced that offset should be an optional part
> of the interface (implied 0 if it's not there).
>
Very cool! You just made my day.
I wish I had time to do a good writeup, but I need to catch a flight in a
couple hours, and I won't be back behind my computer until Wednesday night.
Here is an initial stab:
__array_shape__
Required, a sequence (typically tuple) of non-negative int/longs
__array_storage__
Required, a buffer or possibly sequence object (list)
(Required unless the object support PyBufferProcs directly?
I don't have a strong opinion on that one...)
A slightly different name to indicate it could be a buffer or
sequence object (like a list). Typically buffer.
__array_itemtype__
Suggested, but Optional if __array_itemsize__ is present.
This attribute probably warrants some discussion...
A struct module format string or one of the additional ones
that needs to be added. Need to discuss "long double" and
"Object". (Capital 'O' for Object, Captial 'D' for long double,
Capital 'X' for bit?)
If not present or the empty string '', indicates that the
array elements can only be treated as blobs and the real
data representation must be gotten from some other means.
I think doubling the typecode as a convention to denote complex
numbers makes some sense (for instance 'ff' is complex float).
The struct module convention for denoting native, portable
big endian, and portable little endian is concise and documented.
__array_itemsize__
Optional if __array_itemtype is present and the value can
calculated from struct.calcsize(__array_itemtype__)
__array_strides__
Optional if the array data is in a contiguous C layout.
Required otherwise. Same length as __array_shape__.
Indicates how much to multiply subscripts by to get to
the desired position in the storage.
A sequence (typically tuple) of ints/longs. These are in
byte offsets (not element_size offsets) for most arrays.
Special exceptions made for:
Tightly packed (8 bits to a byte) bitmask arrays, where
they offsets are bit indexes
PyObject arrays (lists) where the offsets are indexes
They should be byte offsets to handle non-aligned data or data
with odd packing.
Fortran arrays might be common enough to warrant special casing.
We could discuss whether a __array_fortran__ attribute indicates
that the array is in contiguous Fortran layout
__array_offset__
Optional and defaults to zero. An int/long indicating the offset
to treat as the zeroth element
__array_complicated__
Optional and defaults to zero/false. This is a kluge to indicate
that while yes the data is an array, the storage layout can not
be easily described by the shape/strides/offset combination alone.
This could warrant some discussion.
__array_fortran__
Optional and defaults to zero/false. If you want to represent
Fortran arrays without creating a strides for them, this would
be necessary. I'd vote to leave it out and stick with strides...
These are all just suggestions. Is something important missing?
Predicates like iscontiguous(a) and isfortran(a) can all be easily
determined from the above. The ndims or rank is simply
len(a.__array_shape__).
I wish I had more time to respond to some of the other things in your
message, but I'm gone until Wednesday night...
Cheers,
-Scott
More information about the NumPy-Discussion
mailing list