[Numpy-discussion] Speeding up numarray -- questions on its design

Perry Greenfield perry at stsci.edu
Mon Jan 17 11:12:28 EST 2005


Travis Oliphant wrote:

> I have some comments based on perusing it's source.   I don't want to
> seem overly critical, so please take my comments with the understanding
> that I appreciate the extensive work that has gone into Numarray.  I do
> think that Numarray has made some great strides.  I would really like 
> to
> see a unification of Numeric and Numarray.
>

> 1) Are there plans to move the nd array entirely into C?
>    -- I would like to see the nd array become purely a c-type. I would
> be willing to help here.  I can see that part of the work has been 
> done.
>
I don't know that I would say they are definite, but I think that at
some point we thought that would be necessary. We haven't yet since
doing so makes it harder to change so it would be one of the last
changes to the core that we would want to do. Our current priorities
are towards making all the major libraries and packages available
under it first and then finishing optimization issues (another issue
that has to be tackled soon is handling 64-bit addressing; apparently
the work to make Python sequences use 64-bit addresses is nearing
completion so we want to be able to handle that. I expect we would
want to make sure we find a way of handling that before we turn it
all into C but maybe it is just as easy doing them in the opposite
order.

> 2) Why is the ND array C-structure so large?   Why are the dimensions
> and strides array static? Why can't the extra stuff that the fancy
> arrays need be another structure and the numarray C structure just
> extended with a pointer to the extra stuff?

When Todd moved NDArray into C,  he tried to keep it simple.  As such,  
it
has no "moving parts."  We think making dimensions and strides malloc'ed
rather than static would be fairly easy.  Making the "extra stuff"
variable is something we can look at.

The bottom line is that adding the variability adds complexity and we're
not sure we understand the storage economics of why we would doing it.
Numarray was designed,  first and foremost,  for large arrays.  For 
that case,
the array struct size is irrelevant whereas additional complexity is
not.  I guess we would like to see some good practical examples where
the array struct size matters. Do you have code with hundreds of 
thousands
of small arrays existing simultaneously?

> 3) There seem to be too many files to define the array.  The mixture of
> Python and C makes trying to understand the source very difficult.  I
> thought one of the reasons for the re-write was to simplify the source
> code.
>
I think this reflects the transitional nature of going from mostly 
Python
to a hybrid. We agree that the current state is more convoluted than it
ought to be. If NDarray were all C, much of this would ended (though in
some respects, being all in C will make it larger, harder to understand
as well). The original hope was that most of the array setup computation
could be kept in Python but that is what made it slow for small arrays
(but it did allow us to implement it reasonably quickly with big array
performance so that we could start using for our own projects without
a long development effort). Unfortunately, the simplification in the
rewrite is offset by handling the more complex cases (byte-swapping,
etc.) and extra array indexing capabilities.

> 4) Object arrays must be supported.  This was a bad oversight and an
> important feature of Numeric arrays.
>
The current implementation does support them (though in a different
way, and generally not as efficiently, though Todd is more up on the
details here). What aspect of object arrays are you finding lacking?
C-api?

> 5) The ufunc code interface needs to continue to be improved.  I do see
> that some effort into understanding the old ufunc interface has taken
> place which is a good sign.
>
You are probably referring to work underway to integrate with scipy (I'm
assuming you are looking at the version in CVS).

> Again, thanks to the work that has been done.  I'm really interested to
> see if some of these modifications can be done as in my mind it will
> help the process of unifying the two camps.
>
I'm glad to see that you are taking a look at it and welcome the 
comments and
any offers of help in improving speed.

Perry





More information about the NumPy-Discussion mailing list