[Numpy-discussion] more than 2-D numarrays in recarray?

Tue Feb 4 10:06:03 EST 2003

A Dimarts 04 Febrer 2003 16:40, Perry Greenfield va escriure:
> We'd like to work with you about how that should be best implemented.
> Basically the issue is how we save the shape information for that field.
> I don't think it would be hard to implement.

Ok, great!

Well, my proposals for extended recarray syntax are:

1.- Extend the actual formats to read something like:
['(1,)i1', '(3,4)i4', '(16,)a', '(2,3,4)i2']

Pro's:
    - It's the straightforward extension of the actual format
    - Should be easy to implement
    - Note that the charcodes has been substituted by a slightly more verbose
      version ('i2' instead of 's', for example)
    - Short and simple

Con's:
    - It is still string-code based
    - Implicit field order

2.- Make use of the syntax I'm suggesting in past messages:

class Particle(IsRecord):
    name        = Col(CharType, (16,), dflt="", pos=3)  # 16-character String
    ADCcount    = Col(Int8, (1,), dflt=0, pos=1)    # signed byte
    TDCcount    = Col(Int32, (3,4), dflt=0, pos=2)    # signed integer
    grid_i      = Col(Int16, (2,3,4), dflt=0, pos=4)    # signed short integer

Pro's:
    - It gets rid of charcodes or string codes
    - The map between name and type is visually clear
    - Explicit field order
    - The columns can be defined as __slots__ in the class constructor
      making impossible to assign (through __setattr__ for example) values to
      non-existing columns.
    - Is elegant (IMO)

Con's:
    - Requires more typing to define
    - Not as concise as 1) (but a short representation can be made inside
      IsRecord!)
    - Difficult to define dynamically

3.- Similar than 2), but with a dictionary like:

Particle = {
    "name"      : Col(CharType, (16,), dflt="", pos=3),  # 16-character String
    "ADCcount"  : Col(Int8, (1,), dflt=0, pos=1),    # signed byte
    "TDCcount"  : Col(Int32, (3,4), dflt=0, pos=2),    # signed integer
    "grid_i"    : Col(Int16, (2,3,4), dflt=0, pos=4),    # signed short 
integer
    }

Pro's:
    - It gets rid of charcodes or string codes
    - The map between name and type is visually clear
    - Explicit field order
    - Easy to build dynamically
Con's
    - No possibility to define __slots__
    - Not as elegant as 2), but it looks fine.

4.- List-based approach:

Particle = [
    Col(Int8, (1,), dflt=0),    # signed byte
    Col(Int32, (3,4), dflt=0),    # signed integer
    Col(CharType, (16,), dflt=""),  # 16-character String
    Col(Int16, (2,3,4), dflt=0),    # signed short integer
    ]

Pro's:
    - Costs less to type (less verbose)
    - Easy to build dynamically
Con's
    - Implicit field order
    - Map between field names and contents not visually clear

Note: In the previous discussion explicit order has been considered better
than implicit, following the Python mantra, and although some people may
think that this don't apply well here, I do (but, again, this is purely
subjective).

Of course, a combination of 2 alternatives can be the best. My current
experience tells me that a combination of 2 and 3 may be very good. In that
way, a user can define their recarrays as classes, but if he needs to define
them dynamically, the recarray constructor can accept also a dictionary like
3 (but, obviously, the same applies to case 4).

In the end, the recarray instance should have a variable that points to this
definition class, where metadata is keeped, but a shortcut in the form
1) can also be constructed for convenience.

IMO integrating options 2 and 3 (even 4) are not difficult to implement and
in fact, such a combination is already present in PyTables CVS version. I
even might provide a recarray version with 2 & 3 integrated for developers
evaluation.

Comments?,

-- 
Francesc Alted