A means for specifying a recarray format might be created from tuples, type objects, and integer repetition factors.
The verbosity of this approach might be a litte tedious, but it would also be transparent, maintainable, and conflict free.
I think this is a very good idea. In fact, while working in PyTables I was lately pondering what would be the best way to define record arrays, and I also think that a verbose approach should be the beast.
After considering metaclasses, and tuples, I ended to a compromise solution between both which are dictionaries combined with some function or class to refine the definition.
My current thinking is something like:
recarrDescr = { "name" : defineType(CharType, 16, ""), # 16-character String "TDCcount" : defineType(UInt8, 1, 0), # unsigned byte "ADCcount" : defineType(Int16, 1, 0), # signed short integer "grid_i" : defineType(Int32, 1, 9), # integer "grid_j" : defineType(Int32, 1, 9), # integer "pressure" : defineType(Float32, 1, 1.), # float (single-precision) "temperature" : defineType(Float64, 32, arange(32)), # double[32] "idnumber" : defineType(Int64, 1, 0), # signed long long }
where defineType is a class that accepts (type, shape, default) parameters. It can be extended safely in the future if more needs appear.
You're way ahead of me here. The only thing I don't like about this is the additional relative complexity because of the addition of field names and default values. It would be nice to layer this more.
Perhaps you may want to consider this for using in recarray definition.
We'll definitely consider it as we hash this out.
I think we should add an "obsolescent feature" warning to numarray and recarray which flags any use of character typecodes when the appropriate command line switches are set.
Well, I don't fully agree with that. I do believe that classes typecodes to be a more meaningful way for describing types, but charcodes can be quite advantageous in certain situations, like in describing in compact way the contents of a record, or passing this info to C-routines to deal with the data.
Yeah, I know.
For example, consider the benefits of describing a recarray format as:
"3s4i20d"
I know.
instead of
((Int16, 3), (Int32, 4), (Float64, 20), )
This is pretty much exactly what I was thinking. It is straightforward to imagine and difficult to forget.
the former being more handy in lots of situations.
Would you please name some of these so we can explore handling them both ways?
I certainly believe that a coexistence of both can be very beneficious, specially for 3rd party extension makers (like me :).
If there's a reasonable way to avoid supporting both, we should.
Suggestion: if recarray charcodes are not necessary to match the Numeric ones, I propose that using the Python convention maybe a good idea. Look at the table in: http://www.python.org/doc/current/lib/module-struct.html.
This sounds good to me, except that it will break an existing interface that I don't have control over. Therefore, I suggest we correct the problem by coming up with something better.
Well, if charcodes finally stay in, this have an additional advantage in that python crew has provided meaningful ways to express padding (character "x"), endianess ("=", "<", ">") and alignment ("@").
We might also add these to the type-repetition tuple. Regards, Todd