Hi, In the context of optimizing the PyTables support for numarray and recarray objects I have been playing with recarray module, and ended with a somewhat improved version of it. Roughly, the modifications done are: - Addition of a cache to quickly access the columns (numarrays) in recarrays. This object is a map (dictionary) where keys are the name fields and values are the pointers to columns regarded as numarrays entities. This dictionary is accessible through the new attribute "_fields". - Addition of an attribute for recarray objects named "_record" which points to a special object ("Record2" class) and that it is aware of the "_fields" cache. It that can be used to access the different rows in recarray objects in an efficient way. - The "_record" object is callable (it defines the "__call__" method) so as to select the recarray row that is active during access to the different fields. Advantages - Access to rows and columns (fields) in recarray objects are one order of magnitude faster (!). - The new "_fields" and "_record" attributes provides convenient and intuitive ways to access the information in recarrays. - The "_record" attribute suports the "__getattr__" and "__setattr__" methods that are very convenient to access fields in a row. Drawbacks - "_record" attribute points always to the same object and you must pass it the row over which you want to operate. So, if you want to have two different objects pointing to different rows, you can't use the "_record" attribute to get them (but you can still use the existing Record class through by calling the "__getitem__" method of a recarray object). - Two new attributes are added to the already large number of recarray variables. However, this new variables has no special space requirements as "_record" object has only three scalar variables and "_fields" is a dictionary with many entries as fields in recarray, which should be not a large amount. I'm attaching this modified version as well as a testbed program in order to test their new access methods and improved performance. The output of this program ran in a pentium4@2GHz machine is also included. Feel free to play with it and/or take/adapt the parts you consider better suited to recarray module. -- Francesc Alted PGP KeyID: 0x61C8C11F
participants (1)
-
Francesc Alted