[Numpy-discussion] the direction and pace of development

Thu Jan 22 09:56:04 EST 2004

As a relative newcomer to this discussion, I would like to respond on a 
couple of points.

eric jones wrote:

> Good thing Duke is beating Maryland as I read, otherwise, mail like 
> this can make you grumpy. :-)
>
> Joe Harrington wrote:
>
[snip]

>> THE PATH
>>
>> Here is what I suggest:
>>
>> 1. We should identify the remaining open interface questions.  Not,
>>   "why is numeric faster than numarray", but "what should the syntax
>>   of creating an array be, and of doing different basic operations".
>>   If numeric and numarray are in agreement on these issues, then we
>>   can move on, and debate performance and features later.
>>  
>>
> ?? I don't get this one.  This interface (at least for numarray) is 
> largely decided.  We have argued the points, and Perry et. al. at 
> STSci made the decisions.  I didn't like some of them, and I'm sure 
> everyone else had at least one thing they wished was changed, but that 
> is the way this open stuff works. 

I have wondered whether the desire to be compatible with Numeric has 
been an inhibitory factor for numarray.  It might be interesting  to see 
the list of decisions which Eric Jones doesn't like.

>
> It is not the interface but the implementation that started this 
> furor.  Travis O.'s suggestion was to back port (much of) the numarray 
> interface to the Numeric code base so that those stuck supporting 
> large co debases (like SciPy) and needing fast small arrays could 
> benefit from the interface enhancements.  One or two of them had 
> backward compatibility issues with Numeric, so he asked how it should 
> be handled.  Unless some magic porting fairy shows up, SciPy will be a 
> Numeric only tool for the next year or so.  This means that users of 
> SciPy either have to forgo some of these features or back port. 

Back porting would appear, to this outsider, to be a regression.  Is 
there no way of changing numarray so that it has the desired speed for 
small arrays?

>
>
> On speed:  <excerpt from private mail to Perry>
> Numeric is already too slow -- we've had to recode a number of 
> routines in C that I don't think we should have in a recent project.  
> For us, the goal is not to approach Numeric's speed but to 
> significantly beat it for all array sizes.  That has to be a 
> possibility for any replacement.  Otherwise, our needs (with the 
> exception of a few features) are already better met by Numeric.  I 
> have some worries about all of the endianness and memory mapped 
> support that are built into Numarray imposing to much overhead for 
> speed-ups on small arrays to be possible (this echo's Travis O's 
> thoughts -- we will happily be proven wrong).  None of our current 
> work needs these features, and paying a price for them is hard to do 
> with an alternative already there.  It is fairly easy to improve its 
> performance on mathematical by just changing the way the ufunc 
> operations are coded.  With some reasonably simple changes, Numeric 
> should be comparable (or at least closer) to Numarray speed for large 
> arrays.  Numeric also has a large number of other optimizations that 
> can be made (memory is zeroed twice in zeros(), asarray was recently 
> improved significantly for the typical case, etc.).  Making these 
> changes would help our selling of Python and, since we have at least a 
> years worth of applications that will be on the SciPy/Numeric 
> platform, it will also help the quality of these applications.
>
> Oh yeah, I have also been surprised at how much of out code uses 
> alltrue(), take(), isnan(), etc.  The speed of these array 
> manipulation methods is really important for us.

I am surprised that alltrue() performance is a concern, but it should be 
easy to implement short circuit evaluation so that False responses are, 
on average, handled more quickly.  If  Boolean arrays are significant, 
in terms of the amount of computer time taken, should they be stored as 
bit arrays?  Would there be a pay-off  for the added complexity?

>
> [snip]
>
>> 3. We should collect or implement a very minimal version of the
>>   featureset, and document it well enough that others like us can do
>>   simple but real tasks to try it out, without reading source code.
>>   That documentation should include lists of things that still need
>>   to be done.
>>   
>
Does numarray not provide the basics?

>> [snip
>> The open source model is successful because it follows closely
>> something that has worked for a long time: the scientific method, with
>> its community contributions, peer review, open discussion, and
>> progress mainly in small steps.  Once basic capability is out there,
>> we can twiddle with how to improve things behind the scenes.
>>
>>  
>>
Colin W.