Re: FW: [Numpy-discussion] typecodes in numarray

Jan. 25, 2003

      Francesc Alted wrote:
...
A Divendres 24 Gener 2003 21:15, Todd Miller va escriure:
...
...
My current thinking is something like:
recarrDescr = {
  "name"        : defineType(CharType, 16, ""),  # 16-character String
  "TDCcount"    : defineType(UInt8, 1, 0),    # unsigned byte
  "ADCcount"    : defineType(Int16, 1, 0),    # signed short integer
  "grid_i"      : defineType(Int32, 1, 9),    # integer
  "grid_j"      : defineType(Int32, 1, 9),    # integer
  "pressure"    : defineType(Float32, 1, 1.),  # float 
(single-precision) "temperature" : defineType(Float64, 32, arange(32)), 
# double[32] "idnumber"    : defineType(Int64, 1, 0),    # signed long
long }
Still think I'd prefer something seperable:
recarrStruct = (   (CharType, 16),
                            UInt8,
                            Int16,
                            Int32,
                            Int32,
                            Float32,
                            (Float64, 32),
                            Int64 )

recarrFields = ["name",
  "TDCcount",
  "ADCcount",
   "grid_i",
   "grid_j",
   "pressure",
   "temperature",
   "idnumber"]

I guess it might not be quite as good for large structs.
...
...
...
where defineType is a class that accepts (type, shape, default)
parameters. It can be extended safely in the future if more needs appear.
You're way ahead of me here.  The only thing I don't like about this is
the additional relative complexity because of the addition of field
names and default values.   It would be nice to layer this more.
Well, I think a map between field names and values is valuable from the
user's point of view. It may help him to label the different information on
the recarray. Moreover, if __getattr__ and __setattr__ methods (or
__getitem__ and __setitem__) would get implemented on recarray (as they are
in my recarray2 version, for example), the field name can become a very
convenient manner to access a specific field by name (this introduce the
limitation that field name must be a valid python identifier, but I think
this is not a big restriction). By looking at the description dictionary,
the user can have a quick idea of what he can find in every field (with no
need of counting, which can be a big advantage specially for long records).
That's true and sounds nice.  I'm just thinking records with named 
fields should be derived
from records with positional fields.  If the functionality is layered, 
 you can use as much
complexity as you need.

It's a good sign that both you and I thought of an identical tuple 
format; it's the obvious
minimal one.
...
With regard to default values, you can make this parameter (even the shape)
a keyword parameter in order to make it optional.
OK.  That's a good point.
...
...
One more thing I don't understand looking at this:  a dictionary is 
unordered.
Yeah, but this can be regarded as an advantage rather than a drawback in the
sense that you can choose the order you (the developer) prefer. For example,
I was using first a alphanumerical order to arrange the data fields, but
now, I'm considering that a arrangement that optimizes the alignment of the
fields could be far better. As for one, say that you have a (Int8, Int32,
Float64) record; in principle it could be easy to create a routine that
arranges this record in the form (Float64,Int32, Int8) that optimizes the
different field access (it may be even possible to introduce automatic
padding later on if recarrays would support them in the future).
Maybe you are getting confused
Yes and no. :)
...
in thinking that recarrDescr will create the
recarray. Not at all, this a *metadata* definition that can be passed to the
actual recarray funtion for recarray creation.
Just like the type repetition tuple except also including field names 
and default values.   I don't think you lost me.  For what we do,  the 
exact physical layout of the "struct" is important, so order matters.  I 
see order as part of the
meta-data,  but I don't usually deal with meta-entities so maybe I've 
got that part wrong.  :)
...
Its function would be
similar to the formats parameter (with typical values like "3a,4i,3w") in
recarray.array, but with more verbosity and all the reported advantages.
...
...
instead of
((Int16, 3),
(Int32, 4),
(Float64, 20),
)
This is pretty much exactly what I was thinking.  It is straightforward
to imagine and difficult to forget.
...
the former being more handy in lots of situations.
Would you please name some of these so we can explore handling them both
ways?
Well, I'm afraid that the best advantage would be when dealing with
recarrays in C extension modules. In this kind of situation it would be far
better to deal with a "3a4i3w" array than a tuple of python objects. But
maybe I'm wrong and the latter is not so-complicated to manage; however, I
used to work a lot with records (even before meeting recarray) and I was
quite comfortable with formats in string mode.
I was thinking that if the above was an issue,  we could write an API 
function(s) to "compile" the type-repetition tuple into arrays of ints 
which describe the type of each field and corresponding repetition factor.
...
Or perhaps it would be enough to provide a method for converting from the
standard metadata layout (dictionary or tuple or whatever), to a string
format. This should be not very difficult.
Almost exactly what I suggested above.

See you Monday,
Todd