[Numpy-discussion] Non-numerical info associated with sub-arrays

Tim Churches tchur at optushome.com.au
Fri Dec 27 15:50:02 EST 2002


On Fri, 2002-12-27 at 12:55, Magnus Lie Hetland wrote:
> Tim Churches <tchur at optushome.com.au>:
> [snip]
> > Have a look at the discussion on RecordArrays in this overview of
> > Numarray: http://stsdas.stsci.edu/numarray/DesignOverview.html
> 
> Sounds interesting.
> 
> > However, in the meantime, as you note, its not too hard to write a class
> > which emulates R/S-Plus data frames. Just store each column in its own
> > Numeric array of the appropriate type
> 
> Yeah -- it's just that I'd like to keep a set of columns collected as
> a two-dimensional array, to allow horizontal summing and the like.
> (Not much more complicated, but an extra issue to address.)
> 
> > (which might be the PyObject
> > types, which can hold any Python object type),
> 
> Hm. Yes. I can't seem to find these anymore. I seem to recall using
> type='o' or something in Numeric, but I can't find the right type
> objects now... (Guess I'm just reading the docs and dir(numeric)
> poorly...) It would be nice if array(['foo']) just worked. Oh, well.

Just like this:

>>> import Numeric
>>> a = Numeric.array(['a','b','c'],typecode=Numeric.PyObject)
>>> a
array([a , b , c ],'O')
>>>

> 
> > By memory-mapping disc-based
> > versions of the  Numeric arrays, and using the BsdDb3 record number
> > database format for the string columns, you can even make a disc-based
> > "record array" which can be larger than available RAM+swap.
> 
> Sounds quite useful, although quite similar to MetaKit. (I suppose I
> could use some functions from numarray on columns in MetaKit... But
> that might just be too weird -- and it would still just be a
> collection of columns :])

I really like MetaKit's column-based storage, but it just doesn't scale
well (on the author's admission, and verified empirically) - beyond a
few 10**5 records, it bogs down terribly, whereas memory-mapped NumPy
plus BsdDb3 recno databse for strings scales well to many tens of
millions of records (or more, but thats as far as I have tested).

Tim C






More information about the NumPy-Discussion mailing list