***[Possible UCE]*** Re: [Numpy-discussion] Data type change completed

Travis Oliphant oliphant.travis at ieee.org
Mon Dec 5 21:44:02 EST 2005

Colin J. Williams wrote:

> Travis Oliphant wrote:
>> I've committed the data-type change discussed at the end of last week 
>> to the SVN repository.  Now the concept of a data type for an array 
>> has been replaced with a "data-descriptor".  This data-descriptor is 
>> flexible enough to handle an arbitrary record specification with 
>> fields that include records and arrays or arrays of records.  While 
>> nesting may not be the best data-layout for a new design, when 
>> memory-mapping an arbitrary fixed-record-length file, this capability 
>> allows you to handle even the most obsure record file.
> Does this mean that the dtype parameter is changed?  obscure??

No, it's not changed.  The dtype parameter is still used and it is still 
called the same thing.   It's just that what constitutes a data-type has 
changed significantly.

For example now tuples and dictionaries can be used to describe a 
data-type.  These definitions are recursive so that whenever data-type 
is used it means anything that can be interpreted as a data-type.  And I 
really mean data-descriptor, but data-type is in such common usage that 
I still use it.

(fixed-size-data-type, shape)
(generic-size-data-type, itemsize)
(base-type-data-type, new-type-data-type)


dtype=(int32, (5,5))   ---  a 5x5 array of int32 is the description of 
this item.
dtype=(str, 10) --- a length-10 string
dtype=(int16, {'real':(int8,0),'imag':(int8,4)}  --- a descriptor that acts
like an int16 array mathematically
(in ufuncs) but has real and imag

Dictionary (defaults to a dtypechar == 'V')

{"names": list-of-field-names,
  "formats":  list of data-types,

  "offsets" : list of  start-of-the-field
  "titles" : extra field names

format2 (and how it's stored internally)

{key1 : (data-type1, offset1 [, title1]),
  key2 : (data-type2, offset2 [, title2]),
  keyn : (data-typen, offsetn [, titlen])

Other objects not already covered:
Right now, it just passes the tp_dict of the typeobject to the 
dictionary-conversion routine.
I'm open for ideas here and will probably have better ideas once the 
actual record data-type (not data-descriptor but actual subclass of the 
scipy.void data type) looks like.

All of these can be used as the dtype parameter wherever it is taken (of 
course you can't
always do something useful with every data-descriptor). 

When an ndarray has an associated type descriptor with fields (that's 
where the field information is
stored),  then those fields can be accessed using string or unicode keys 
to the getitem call.

Thus, you can do something like this:

 >>> a = ones((4,3), dtype=(int16, {'real':(int8, 0), 'imag':(int8, 1)}))
 >>> a['imag'] = 2
 >>> a['real'] = 1
 >>> a.tostring()

Note that there are now three distinct but interacting Python objects:

1) the N-dimensional array of a fixed itemsize.
2) a Python object representing one element of the array.
3) the data-descriptor object describing the data-type of the array.

These three things were always there under the covers (the 
PyArray_Descr* has been there since Numeric), and the standard Python 
types were always filling in for number 2.  Now we are just being more 
explicit about it.

Now, all three things are present and accounted for.  I'm really quite 
happy with the resulting infrastructure. I think it will allow some 
really neat possibilities.

I'm thinking the record array subclass will allow attribute-based 
look-up and register a nice record type for the actual "element" in of 
the record array.


More information about the NumPy-Discussion mailing list