more than 2-D numarrays in recarray?

Hi, It seems that recarray doesn't support more than 1-D numarray arrays as fields. Is that a fundamental limitation? If not, do you plan to support arbitrary dimensions in the future?. Thanks, -- Francesc Alted

Francesc Alted wrote:
I don't think it is fundamental, merely a question of what is needed and works easily. I see two problems with multi-d numarray fields, both solvable: 1. Multidimensional numarrays must be described in the recarray spec. 2. Either numarray or recarray must be able to handle a (slightly) more complicated case of recomputing array strides from shape and (bytestride,record-length). I didn't design or implement recarray so there may be other problems as well.
If not, do you plan to support arbitrary dimensions in the future?.
I don't think it's a priority now. What do you need them for?
Thanks,
Regards, Todd

A Dimarts 04 Febrer 2003 13:19, Todd Miller va escriure:
I had a look at the code and it seems like you are right.
I don't think it's a priority now. What do you need them for?
Well, I've adopted the recarray object (actually a slightly modified version of it) to be a fundamental building block in next release of PyTables. If arbitrary dimensionality were implemented, the resulting tables would be more general. Moreover, I'm thinking about implementing unlimited (just one axis) array dimension support and having a degenerated recarray with just one column as a multimensional numarray object would easy quite a lot the implementation. Of course, I could implement my own recarray version with that support, but I just don't want to diverge so much from the reference implementation. -- Francesc Alted

As Todd says, the initial implementation was to support only 1-d cases. There is no fundamental reason why it shouldn't support the general case. We'd like to work with you about how that should be best implemented. Basically the issue is how we save the shape information for that field. I don't think it would be hard to implement. Perry

A Dimarts 04 Febrer 2003 16:40, Perry Greenfield va escriure:
Ok, great! Well, my proposals for extended recarray syntax are: 1.- Extend the actual formats to read something like: ['(1,)i1', '(3,4)i4', '(16,)a', '(2,3,4)i2'] Pro's: - It's the straightforward extension of the actual format - Should be easy to implement - Note that the charcodes has been substituted by a slightly more verbose version ('i2' instead of 's', for example) - Short and simple Con's: - It is still string-code based - Implicit field order 2.- Make use of the syntax I'm suggesting in past messages: class Particle(IsRecord): name = Col(CharType, (16,), dflt="", pos=3) # 16-character String ADCcount = Col(Int8, (1,), dflt=0, pos=1) # signed byte TDCcount = Col(Int32, (3,4), dflt=0, pos=2) # signed integer grid_i = Col(Int16, (2,3,4), dflt=0, pos=4) # signed short integer Pro's: - It gets rid of charcodes or string codes - The map between name and type is visually clear - Explicit field order - The columns can be defined as __slots__ in the class constructor making impossible to assign (through __setattr__ for example) values to non-existing columns. - Is elegant (IMO) Con's: - Requires more typing to define - Not as concise as 1) (but a short representation can be made inside IsRecord!) - Difficult to define dynamically 3.- Similar than 2), but with a dictionary like: Particle = { "name" : Col(CharType, (16,), dflt="", pos=3), # 16-character String "ADCcount" : Col(Int8, (1,), dflt=0, pos=1), # signed byte "TDCcount" : Col(Int32, (3,4), dflt=0, pos=2), # signed integer "grid_i" : Col(Int16, (2,3,4), dflt=0, pos=4), # signed short integer } Pro's: - It gets rid of charcodes or string codes - The map between name and type is visually clear - Explicit field order - Easy to build dynamically Con's - No possibility to define __slots__ - Not as elegant as 2), but it looks fine. 4.- List-based approach: Particle = [ Col(Int8, (1,), dflt=0), # signed byte Col(Int32, (3,4), dflt=0), # signed integer Col(CharType, (16,), dflt=""), # 16-character String Col(Int16, (2,3,4), dflt=0), # signed short integer ] Pro's: - Costs less to type (less verbose) - Easy to build dynamically Con's - Implicit field order - Map between field names and contents not visually clear Note: In the previous discussion explicit order has been considered better than implicit, following the Python mantra, and although some people may think that this don't apply well here, I do (but, again, this is purely subjective). Of course, a combination of 2 alternatives can be the best. My current experience tells me that a combination of 2 and 3 may be very good. In that way, a user can define their recarrays as classes, but if he needs to define them dynamically, the recarray constructor can accept also a dictionary like 3 (but, obviously, the same applies to case 4). In the end, the recarray instance should have a variable that points to this definition class, where metadata is keeped, but a shortcut in the form 1) can also be constructed for convenience. IMO integrating options 2 and 3 (even 4) are not difficult to implement and in fact, such a combination is already present in PyTables CVS version. I even might provide a recarray version with 2 & 3 integrated for developers evaluation. Comments?, -- Francesc Alted

Francesc Alted wrote:
I don't think it is fundamental, merely a question of what is needed and works easily. I see two problems with multi-d numarray fields, both solvable: 1. Multidimensional numarrays must be described in the recarray spec. 2. Either numarray or recarray must be able to handle a (slightly) more complicated case of recomputing array strides from shape and (bytestride,record-length). I didn't design or implement recarray so there may be other problems as well.
If not, do you plan to support arbitrary dimensions in the future?.
I don't think it's a priority now. What do you need them for?
Thanks,
Regards, Todd

A Dimarts 04 Febrer 2003 13:19, Todd Miller va escriure:
I had a look at the code and it seems like you are right.
I don't think it's a priority now. What do you need them for?
Well, I've adopted the recarray object (actually a slightly modified version of it) to be a fundamental building block in next release of PyTables. If arbitrary dimensionality were implemented, the resulting tables would be more general. Moreover, I'm thinking about implementing unlimited (just one axis) array dimension support and having a degenerated recarray with just one column as a multimensional numarray object would easy quite a lot the implementation. Of course, I could implement my own recarray version with that support, but I just don't want to diverge so much from the reference implementation. -- Francesc Alted

As Todd says, the initial implementation was to support only 1-d cases. There is no fundamental reason why it shouldn't support the general case. We'd like to work with you about how that should be best implemented. Basically the issue is how we save the shape information for that field. I don't think it would be hard to implement. Perry

A Dimarts 04 Febrer 2003 16:40, Perry Greenfield va escriure:
Ok, great! Well, my proposals for extended recarray syntax are: 1.- Extend the actual formats to read something like: ['(1,)i1', '(3,4)i4', '(16,)a', '(2,3,4)i2'] Pro's: - It's the straightforward extension of the actual format - Should be easy to implement - Note that the charcodes has been substituted by a slightly more verbose version ('i2' instead of 's', for example) - Short and simple Con's: - It is still string-code based - Implicit field order 2.- Make use of the syntax I'm suggesting in past messages: class Particle(IsRecord): name = Col(CharType, (16,), dflt="", pos=3) # 16-character String ADCcount = Col(Int8, (1,), dflt=0, pos=1) # signed byte TDCcount = Col(Int32, (3,4), dflt=0, pos=2) # signed integer grid_i = Col(Int16, (2,3,4), dflt=0, pos=4) # signed short integer Pro's: - It gets rid of charcodes or string codes - The map between name and type is visually clear - Explicit field order - The columns can be defined as __slots__ in the class constructor making impossible to assign (through __setattr__ for example) values to non-existing columns. - Is elegant (IMO) Con's: - Requires more typing to define - Not as concise as 1) (but a short representation can be made inside IsRecord!) - Difficult to define dynamically 3.- Similar than 2), but with a dictionary like: Particle = { "name" : Col(CharType, (16,), dflt="", pos=3), # 16-character String "ADCcount" : Col(Int8, (1,), dflt=0, pos=1), # signed byte "TDCcount" : Col(Int32, (3,4), dflt=0, pos=2), # signed integer "grid_i" : Col(Int16, (2,3,4), dflt=0, pos=4), # signed short integer } Pro's: - It gets rid of charcodes or string codes - The map between name and type is visually clear - Explicit field order - Easy to build dynamically Con's - No possibility to define __slots__ - Not as elegant as 2), but it looks fine. 4.- List-based approach: Particle = [ Col(Int8, (1,), dflt=0), # signed byte Col(Int32, (3,4), dflt=0), # signed integer Col(CharType, (16,), dflt=""), # 16-character String Col(Int16, (2,3,4), dflt=0), # signed short integer ] Pro's: - Costs less to type (less verbose) - Easy to build dynamically Con's - Implicit field order - Map between field names and contents not visually clear Note: In the previous discussion explicit order has been considered better than implicit, following the Python mantra, and although some people may think that this don't apply well here, I do (but, again, this is purely subjective). Of course, a combination of 2 alternatives can be the best. My current experience tells me that a combination of 2 and 3 may be very good. In that way, a user can define their recarrays as classes, but if he needs to define them dynamically, the recarray constructor can accept also a dictionary like 3 (but, obviously, the same applies to case 4). In the end, the recarray instance should have a variable that points to this definition class, where metadata is keeped, but a shortcut in the form 1) can also be constructed for convenience. IMO integrating options 2 and 3 (even 4) are not difficult to implement and in fact, such a combination is already present in PyTables CVS version. I even might provide a recarray version with 2 & 3 integrated for developers evaluation. Comments?, -- Francesc Alted
participants (3)
-
Francesc Alted
-
Perry Greenfield
-
Todd Miller