[Numpy-discussion] more than 2-D numarrays in recarray?
Francesc Alted
falted at openlc.org
Tue Feb 4 10:06:03 EST 2003
A Dimarts 04 Febrer 2003 16:40, Perry Greenfield va escriure:
> We'd like to work with you about how that should be best implemented.
> Basically the issue is how we save the shape information for that field.
> I don't think it would be hard to implement.
Ok, great!
Well, my proposals for extended recarray syntax are:
1.- Extend the actual formats to read something like:
['(1,)i1', '(3,4)i4', '(16,)a', '(2,3,4)i2']
Pro's:
- It's the straightforward extension of the actual format
- Should be easy to implement
- Note that the charcodes has been substituted by a slightly more verbose
version ('i2' instead of 's', for example)
- Short and simple
Con's:
- It is still string-code based
- Implicit field order
2.- Make use of the syntax I'm suggesting in past messages:
class Particle(IsRecord):
name = Col(CharType, (16,), dflt="", pos=3) # 16-character String
ADCcount = Col(Int8, (1,), dflt=0, pos=1) # signed byte
TDCcount = Col(Int32, (3,4), dflt=0, pos=2) # signed integer
grid_i = Col(Int16, (2,3,4), dflt=0, pos=4) # signed short integer
Pro's:
- It gets rid of charcodes or string codes
- The map between name and type is visually clear
- Explicit field order
- The columns can be defined as __slots__ in the class constructor
making impossible to assign (through __setattr__ for example) values to
non-existing columns.
- Is elegant (IMO)
Con's:
- Requires more typing to define
- Not as concise as 1) (but a short representation can be made inside
IsRecord!)
- Difficult to define dynamically
3.- Similar than 2), but with a dictionary like:
Particle = {
"name" : Col(CharType, (16,), dflt="", pos=3), # 16-character String
"ADCcount" : Col(Int8, (1,), dflt=0, pos=1), # signed byte
"TDCcount" : Col(Int32, (3,4), dflt=0, pos=2), # signed integer
"grid_i" : Col(Int16, (2,3,4), dflt=0, pos=4), # signed short
integer
}
Pro's:
- It gets rid of charcodes or string codes
- The map between name and type is visually clear
- Explicit field order
- Easy to build dynamically
Con's
- No possibility to define __slots__
- Not as elegant as 2), but it looks fine.
4.- List-based approach:
Particle = [
Col(Int8, (1,), dflt=0), # signed byte
Col(Int32, (3,4), dflt=0), # signed integer
Col(CharType, (16,), dflt=""), # 16-character String
Col(Int16, (2,3,4), dflt=0), # signed short integer
]
Pro's:
- Costs less to type (less verbose)
- Easy to build dynamically
Con's
- Implicit field order
- Map between field names and contents not visually clear
Note: In the previous discussion explicit order has been considered better
than implicit, following the Python mantra, and although some people may
think that this don't apply well here, I do (but, again, this is purely
subjective).
Of course, a combination of 2 alternatives can be the best. My current
experience tells me that a combination of 2 and 3 may be very good. In that
way, a user can define their recarrays as classes, but if he needs to define
them dynamically, the recarray constructor can accept also a dictionary like
3 (but, obviously, the same applies to case 4).
In the end, the recarray instance should have a variable that points to this
definition class, where metadata is keeped, but a shortcut in the form
1) can also be constructed for convenience.
IMO integrating options 2 and 3 (even 4) are not difficult to implement and
in fact, such a combination is already present in PyTables CVS version. I
even might provide a recarray version with 2 & 3 integrated for developers
evaluation.
Comments?,
--
Francesc Alted
More information about the NumPy-Discussion
mailing list