Travis, This is intended to restore the context of your response. TO is Travis Oliphant, CW is Colin Williams TO: The dtype parameter is still used and it is still called the same thing. It's just that what constitutes a data-type has changed significantly. For example now tuples and dictionaries can be used to describe a data-type. These definitions are recursive so that whenever data-type is used it means anything that can be interpreted as a data-type. And I really mean data-descriptor, but data-type is in such common usage that I still use it. CW: This would appear to be a good step forward but with all of the immutable types (int8, FloatType, TupleType, etc.) the data is stored in the ArrayType instance (array_data?) whereas, with a dictionary, it would appear to be necessary to store the items outside the array. Is that desirable? TO: I don't even understand what you are questioning here. Perhaps you misunderstood me. Items are always stored in the array. Even records "items" are stored in the array. The descriptor just allows you to describe what kind of data and how it is stored in the array. Response: Yes, I had assumed that int8 and FloatType are equally data-descriptors, i.e. objects which describe the elements of a numeric array. Wrongly, I assumed that TupleType or DictType are also a data descriptors. Mea culpa. On another subject, it would help, for those of us who are not afficionados of C, if you provided the equivalent Python def statement for those function implemented in C, especially the ArrayType's __new__ method. perhaps in the docstrings. Colin W. Travis Oliphant wrote:
Colin J. Williams wrote:
This would appear to be a good step forward but with all of the immutable types (int8, FloatType, TupleType, etc.) the data is stored in the ArrayType instance (array_data?) whereas, with a dictionary, it would appear to be necessary to store the items outside the array. Is that desirable?
I don't even understand what you are questioning here. Perhaps you misunderstood me. Items are always stored in the array. Even records "items" are stored in the array. The descriptor just allows you to describe what kind of data and how it is stored in the array.
Even the tuple can have its content modified, as the example below shows:
I don't understand what you mean to show by the tuple-content-modifying example.
dtype=(int32, (5,5)) --- a 5x5 array of int32 is the description of this item. dtype=(str, 10) --- a length-10 string
So dtype now contains both the data type of each element and the shape of the array? This seems a significant change from numarray or Numeric.
No, no. Standard usage is the same. In normal usage you would not create an array this way. You could, of course, but it's not the documented procedure. The reason for this descriptor is to allow you to have a field-element that itself is an array of items.
dtype=(int16, {'real':(int8,0),'imag':(int8,4)} --- a descriptor that acts
like an int16 array mathematically
(in ufuncs) but has real and imag
fields.
This adds complexity, is there a compensating benefit? Do all of the complex operations apply?
I'm only showing what is possible and that the notion of data-type descriptor is complete.
Why not clean things up by dropping typechar? These seemed to be one of the warts in numarray, only carried forward for compatibility reasons. Could the compatibility objectives of the project not be achieved, outside the ArrayType object, with a wrapper of some sort?
Too hard to do at this point. Too much code uses the characters. I also don't mind the characters so much (the struct module and the Python array module use them).
Couldn't the use of records avoid the cumbersome use of keys?
Yes. But, this is how you specify fields generally.
format2 (and how it's stored internally)
{key1 : (data-type1, offset1 [, title1]), key2 : (data-type2, offset2 [, title2]), ... keyn : (data-typen, offsetn [, titlen]) }
This is cleaner, but couldn't this inormation be contained within the Record instance?
Yes, but I am describing a general concept of data-type description not just something applicable to records.
Thus, you can do something like this:
a = ones((4,3), dtype=(int16, {'real':(int8, 0), 'imag':(int8, 1)})) a['imag'] = 2 a['real'] = 1 a.tostring() '\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02'
Or, one could have something like: class SmallComplex(Record): ..''' This class typically has no instances in user code. ''' ..real= (int8, ) ..imag= (int8) ..def __init__(self): .... ..def __new__(self): ....
a = ones((4,3), dtype= SmallComplex) a.imag = 2 a.real = 1 a.tostring() '\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02\x01\x02'
Yes, something like this should be possible, though I have not fleshed out the user-interface at all. I've just been working on the basic structure that would support such things. Do:
class small_complex(record): dtypedescr = {'r':(int8,0),'i':(int8,1)}
a = ndrecarray((4,3), formats=small_complex) a.r = 1 a.i = 2 a.tostring()
and your example would work (with current SVN...)
The ndrecarray subclass allows the attribute-to-field conversion, regular arrays do not.
-Travis
-Travis
participants (1)
-
Colin J. Williams