[Numpy-discussion] Speeding up numarray -- questions on its design

Travis Oliphant oliphant at ee.byu.edu
Tue Jan 18 10:28:37 EST 2005


Thanks for the comments that have been made.  One of my reasons for 
commenting is to get an understanding of which design issues of Numarray 
are felt to be important and which can change.  There seems to be this 
idea that small arrays are not worth supporting.  I hope this is just 
due to time-constraints and not some fundamental idea that small arrays 
should never be considered with Numarray.    Otherwise, there will 
always be two different array implementations developing at their own pace.

I really want to gauge how willing developers of numarray are to 
changing things.


Perry Greenfield wrote:

>> 1) Are there plans to move the nd array entirely into C?
>>    -- I would like to see the nd array become purely a c-type. I would
>> be willing to help here.  I can see that part of the work has been done.
>>
> I don't know that I would say they are definite, but I think that at
> some point we thought that would be necessary. We haven't yet since
> doing so makes it harder to change so it would be one of the last
> changes to the core that we would want to do. Our current priorities
> are towards making all the major libraries and packages available
> under it first and then finishing optimization issues (another issue
> that has to be tackled soon is handling 64-bit addressing; apparently
> the work to make Python sequences use 64-bit addresses is nearing
> completion so we want to be able to handle that. I expect we would
> want to make sure we find a way of handling that before we turn it
> all into C but maybe it is just as easy doing them in the opposite
> order.

I do not think it would be difficult at this point to move it all to C 
and then make future changes there (you can always call pure Python code 
from C).  With the structure in place and some experience behind you, 
now seems like as good a time as any.  Especially, because now is a 
better time for me than any... I like what numarray is doing by not 
always defaulting to ints with the maybelong type.  It is a good idea.

>
>> 2) Why is the ND array C-structure so large?   Why are the dimensions
>> and strides array static? Why can't the extra stuff that the fancy
>> arrays need be another structure and the numarray C structure just
>> extended with a pointer to the extra stuff?
>
>
> When Todd moved NDArray into C,  he tried to keep it simple.  As 
> such,  it
> has no "moving parts."  We think making dimensions and strides malloc'ed
> rather than static would be fairly easy.  Making the "extra stuff"

> variable is something we can look at.

But allocating dimensions and strides when needed is not difficult and 
it reduces the overhead of the ndarray object.  Currently, that overhead 
seems extreme.  I could be over-reacting here, but it just seems like it 
would have made more sense to expand the array object as little as 
possible to handle the complexity that you were searching for.  It seems 
like more modifications were needed in the ufunc then in the arrayobject.

>
> The bottom line is that adding the variability adds complexity and we're
> not sure we understand the storage economics of why we would doing it.
> Numarray was designed,  first and foremost,  for large arrays.

Small arrays are never going to disappear (Fernando Perez has an 
excellent example) and there are others.  A design where a single 
pointer not being NULL is all that is needed to distinguish "simple" 
Numeric-like arrays from "fancy" numarray-like arrays seems like a great 
way to make sure that

> For that case,
> the array struct size is irrelevant whereas additional complexity is
> not.  I guess we would like to see some good practical examples where
> the array struct size matters. Do you have code with hundreds of 
> thousands
> of small arrays existing simultaneously?

As mentioned before, such code exists especially when arrays become a 
basic datatype that you use all the time.    How much complexity is 
really generated by offloading the extra struct material to a bigarray 
structure, thereby only increasing the Numeric array structure by 4 
bytes instead of 200+?

On another fundamental note, numarray is being sold as a replacement for 
Numeric.  But, then, on closer inspection many things that Numeric does 
well, numarray is ignoring or not doing very well.  I think this 
presents a certain amount of false advertising to new users, who don't 
understand the history.  Most of them would probably never need the 
fanciness that numarray provides and would be quite satisfied with 
Numeric.  They just want to know what others are using.   I think it is 
a disservice to call numarray a replacement for Numeric until it 
actually is.  It should currently be called an "alternative 
implementation" focused on large arrays.  This (unintentional) slight of 
hand that has been occurring over the past year has been my biggest 
complaint with numarray.   Making numarray a replacement for Numeric 
means that it has to support small arrays, object arrays, and ufuncs at 
least as well as but preferably better than Numeric.  It should also be 
faster than Numeric whenever possible, because Numeric has lots of 
potential optimizations that have never been applied.   If numarray does 
not do these things, then in my mind it cannot be a replacement for 
Numeric and should stop being called that on the numpy web site.

>> 3) There seem to be too many files to define the array.  The mixture of
>> Python and C makes trying to understand the source very difficult.  I
>> thought one of the reasons for the re-write was to simplify the source
>> code.
>>
> I think this reflects the transitional nature of going from mostly Python
> to a hybrid. We agree that the current state is more convoluted than it
> ought to be. If NDarray were all C, much of this would ended (though in
> some respects, being all in C will make it larger, harder to understand
> as well). The original hope was that most of the array setup computation
> could be kept in Python but that is what made it slow for small arrays
> (but it did allow us to implement it reasonably quickly with big array
> performance so that we could start using for our own projects without
> a long development effort). Unfortunately, the simplification in the
> rewrite is offset by handling the more complex cases (byte-swapping,
> etc.) and extra array indexing capabilities.

I never really understood the "code is too complicated" argument 
anyway.   I was just wondering if there is some support for reducing the 
number of source code files, or reorganizing them a bit.

>> 4) Object arrays must be supported.  This was a bad oversight and an
>> important feature of Numeric arrays.
>>
> The current implementation does support them (though in a different
> way, and generally not as efficiently, though Todd is more up on the
> details here). What aspect of object arrays are you finding lacking?
> C-api?

I did not see such support when I looked at it, but given the previous 
comment, I could easily have missed where that support is provided.  I'm 
mainly following up on Konrad's comment that his Automatic 
differentiation does not work with Numarray because of the missing 
support for object arrays.  There are other applications for object 
arrays as well.   Most of the support needs to come from the ufunc side.

>
>> 5) The ufunc code interface needs to continue to be improved.  I do see
>> that some effort into understanding the old ufunc interface has taken
>> place which is a good sign.
>>
> You are probably referring to work underway to integrate with scipy (I'm
> assuming you are looking at the version in CVS).

Yes, I'm looking at the CVS version.

>
>> Again, thanks to the work that has been done.  I'm really interested to
>> see if some of these modifications can be done as in my mind it will
>> help the process of unifying the two camps.
>>
> I'm glad to see that you are taking a look at it and welcome the 
> comments and
> any offers of help in improving speed.
>
I would be interested in helping if there is support for really making 
numarray a real replacement for Numeric, by addressing the concerns that 
I've outlined.     As stated at the beginning, I'm really just looking 
for how receptive numarray developers would be to the kinds of changes 
I'm talking about: (1) reducing the size of the array structure,  (2) 
moving the ndarray entirely into C, (3) improving support for object 
arrays, (4) improving ufunc API support.

I care less about array and ufunc C-API names being the same then the 
underlying capabilities being available.

Best regards,

-Travis Oliphant









More information about the NumPy-Discussion mailing list