[Numpy-discussion] Non-numerical info associated with sub-arrays

Tim Churches tchur at optushome.com.au
Fri Dec 27 14:13:02 EST 2002

On Fri, 2002-12-27 at 11:29, Magnus Lie Hetland wrote:
> I'm working on some two-dimensional tables of data, where some data
> are numerical, while other aren't. I'd like to use numarray's
> numerical capabilities with the numerical parts (columns) while
> keeping the data in each row together. (I'm sure this generalizes to
> more dimensions, and to sub-arrays in general, not just rows.)
> It's not a hard problem, really, but the obvious solution--to keep
> the other rows in separate arrays/lists and just juggle things
> around--seems a bit clunky. I was just wondering if anyone had other
> ideas (would it be practical to include all the data in a single array
> somehow--I seem to recall that Numeric could have arbitrary object
> arrays, but I'm not sure whether numarray supports this?) or perhaps
> some hints on how to organize code around this? I wrote a small class
> that wraps things up and works a bit lik R/S-plus's data frames; is
> there some other more standard code out there for this sort of thing?
> (It's a problem that crops up often in data processing of various
> kinds...)

Have a look at the discussion on RecordArrays in this overview of
Numarray: http://stsdas.stsci.edu/numarray/DesignOverview.html

However, in the meantime, as you note, its not too hard to write a class
which emulates R/S-Plus data frames. Just store each column in its own
Numeric array of the appropriate type (which might be the PyObject
types, which can hold any Python object type), and have the wrapper
class implement __getitem__ etc to collect the relevant "rows" from each
column and return them as a complete row as a dict or a sequence. Not
that fast, but not slow either. You can implement a generator to allow
cursor-like traversal of the all the rows if you like. Happy to
collaborate on furthering this idea. By memory-mapping disc-based
versions of the  Numeric arrays, and using the BsdDb3 record number
database format for the string columns, you can even make a disc-based
"record array" which can be larger than available RAM+swap. I hope to
release code written under contract by Dave Cole (see
http://www.object-craft.com.au ) which illustrates this idea in the next
month or so (but I've been saying that to myself for a year or more...).

Tim C

More information about the NumPy-Discussion mailing list