[Numpy-discussion] Object arrays for numarray / What do you use Numeric object arrays for?

Todd Miller jmiller at stsci.edu
Wed Jul 16 15:37:04 EDT 2003


On Wed, 2003-07-16 at 17:43, Tim Churches wrote:
> On Wed, 2003-07-16 at 05:34, Todd Miller wrote:
> > I am adding arrays of Python objects to numarray and so I am curious
> > about the uses people have found for Numeric's object arrays.  If you
> > have found Numeric's object arrays useful,  please tell us about what
> > you used them for so that we can make certain that numarray can satisfy
> > the same need.
> 
> We use NumPy to store vectors (rank-1 arrays) of numbers representing
> columns in a dataset. The NumPy arrays, which are large and numerous) 
> are memory-mapped (using an extension) to disc to conserve real memory.
> However, in some vectors (columns) we need to store variable-length, and
> in others, variable length sequences of integers or floats (and possibly
> even sets in the future). NumPy's object arrays are more
> memory-efficient that Python lists of lists or lists of strings from

Well, right now the prototype actually uses a single list internally as
the object store;  still, we might beat out lists of lists by a small
margin.

> these purposes, and of course they support NumPy functions such as
> take(), which makes life simpler. 

The prototype currently uses common code for put/take on strings, object
arrays, and soon record arrays.  The common code is currently Python
prototype.  Numarray numeric arrays use specialized C-code for speed.

> But we haven't been able to memory-map
> these object arrays, which is a problem. Is there any prospect of
> numarray supporting memory-mapped arrays of sequences/strings?

numarray supports arrays of fixed length strings with its chararray
module.  The default chararray string stripping and padding functions
blank fill unused space and give the appearance of variable length
strings.  The data buffers of all of numarray's classes which represent
primitive data items  (numbers, strings, records) can be memory mapped. 

I think however that memory mapping sequences or arbitrary Python
objects isn't going to happen in numarray any time soon;  it sounds too
much like object persistence.

> I know
> that is a big ask! We have an extension module which stores variable
> length blobs in a single memory-mapped file which might be useful - the
> code could be made available to the numarray project, I think.

I don't understand the difference between your module and Python's mmap.

> 
> We also use MA extensively (because in the health care domain life is
> full of missing data) - I'll jot down some thoughts on how MA could be
> improved in the next few days.

I'd be very interested in hearing your thoughts on improving MA.

-- 
Todd Miller <jmiller at stsci.edu>





More information about the NumPy-Discussion mailing list