[Numpy-discussion] Re: Bytes Object and Metadata

Mon Mar 28 10:30:22 EST 2005

--- Travis Oliphant <oliphant at ee.byu.edu> wrote:
> 
> Thank you for your detailed explanations.  This is starting to make more 
> sense to me.  It is obvious that you understand what we are trying to 
> do, and I pretty much agree with you in how you think it should be 
> done.  I think you do a great job of explaining things. 
> 
> I agree we should come up with a set of names for the interface to 
> arrayobjects.  I'm even convinced that offset should be an optional part 
> of the interface (implied 0 if it's not there).
> 

Very cool!  You just made my day.

I wish I had time to do a good writeup, but I need to catch a flight in a
couple hours, and I won't be back behind my computer until Wednesday night.
 Here is an initial stab:

  __array_shape__   
       Required, a sequence (typically tuple) of non-negative int/longs

  __array_storage__
       Required, a buffer or possibly sequence object (list)

       (Required unless the object support PyBufferProcs directly?
        I don't have a strong opinion on that one...)

       A slightly different name to indicate it could be a buffer or
       sequence object (like a list).  Typically buffer.

  __array_itemtype__
       Suggested, but Optional if __array_itemsize__ is present.

       This attribute probably warrants some discussion...

       A struct module format string or one of the additional ones
       that needs to be added.  Need to discuss "long double" and
       "Object".  (Capital 'O' for Object, Captial 'D' for long double,
       Capital 'X' for bit?)

       If not present or the empty string '', indicates that the
       array elements can only be treated as blobs and the real
       data representation must be gotten from some other means.

       I think doubling the typecode as a convention to denote complex
       numbers makes some sense (for instance 'ff' is complex float).

       The struct module convention for denoting native, portable
       big endian, and portable little endian is concise and documented.

  __array_itemsize__
       Optional if __array_itemtype is present and the value can 
       calculated from struct.calcsize(__array_itemtype__)

  __array_strides__
       Optional if the array data is in a contiguous C layout.
       Required otherwise.  Same length as __array_shape__.
       Indicates how much to multiply subscripts by to get to
       the desired position in the storage.

       A sequence (typically tuple) of ints/longs.  These are in
       byte offsets (not element_size offsets) for most arrays.
       Special exceptions made for:
           Tightly packed (8 bits to a byte) bitmask arrays, where
           they offsets are bit indexes

           PyObject arrays (lists) where the offsets are indexes

       They should be byte offsets to handle non-aligned data or data
       with odd packing.

       Fortran arrays might be common enough to warrant special casing.
       We could discuss whether a __array_fortran__ attribute indicates
       that the array is in contiguous Fortran layout

  __array_offset__
       Optional and defaults to zero.  An int/long indicating the offset
       to treat as the zeroth element

  __array_complicated__
       Optional and defaults to zero/false.  This is a kluge to indicate
       that while yes the data is an array, the storage layout can not
       be easily described by the shape/strides/offset combination alone.

       This could warrant some discussion.

  __array_fortran__
       Optional and defaults to zero/false.  If you want to represent
       Fortran arrays without creating a strides for them, this would
       be necessary.  I'd vote to leave it out and stick with strides...

These are all just suggestions.  Is something important missing?

Predicates like iscontiguous(a) and isfortran(a) can all be easily
determined from the above.  The ndims or rank is simply
len(a.__array_shape__).

I wish I had more time to respond to some of the other things in your
message, but I'm gone until Wednesday night...

Cheers,
    -Scott