[Numpy-discussion] Numpy x Matlab: some synthetic benchmarks

Wed Jan 18 14:01:02 EST 2006

Travis Oliphant wrote:
> Andrew Straw wrote:
> 
> 
>>Here's an idea Fernando and I have briefly talked about off-list, but 
>>which perhaps bears talking about here: Is there speed to be gained by 
>>an alternative, very simple, very optimized ndarray constructor? The 
>>idea would be a special-case constructor with very limited 
>>functionality designed purely for speed. It wouldn't support (m)any of 
>>the fantastic things Travis has done, but would be useful only in 
>>specialized use cases, such as creating indices.
> 
> 
> The general purpose constructor is
> 
> PyArray_NewFromDescr(...)
> 
> I suspect this could be special cased for certain circumstances and the 
> special-case called occasionally. Their are checks on the dimensions 
> that could be avoided in certain circumstances (like when we are getting 
> the dimensions from another arrayobject already...)
> 
> We could also inline the __array_from_strides code...
> 
> Besides that, I'm not sure what else to optimize...

Just to give some context: this came to my mind inspired by Blitz++'s 
TinyVector and TinyMatrix objects.  In Blitz, arrays have compile-time rank, 
but run-time size in all dimensions.  Since this introduces some overhead, 
Blitz offers also the Tiny* classes, which are compile-time fixed _both_ in 
rank and in size.  This allows a number of optimizations to be made on them, 
at the cost of some flexibility lost.  Some info on these guys:

http://www.oonumerics.org/blitz/manual/blitz07.html

What Andrew and I discussed was the idea of writing some object which would 
only support the most basic operations: element-wise arithmetic, slicing, 
linear algebra calls on them (matrix-matrix, matrix-vector and vector 
operations), and little else.  I'd be OK losing fancy indexing, byteswapping, 
memory-mapping, reshaping, and anything else which costs either:

1. initialization-time CPU cycles
2. memory footprint
3. runtime element access and arithmetic.

Such objects could be very useful in many contexts.  I'd even like an 
immutable version, so they could be used as dictionary keys without having to 
make a tuple out of them.  This would allow algorithms which use small arrays 
as multidimensional indices in sparse tree structures to be used without the 
hoops one must jump through today, and with higher performance.

I wonder if giving up reshaping would allow the indexing code to be faster, as 
specialized versions could be hard-coded for each rank, with only say ranks 
1-4 offered for this kind of object (I know we recently had a discussion about 
large ranks, but this object would be geared towards pure performance, and 
certainly working in 32 dimensions is a 'flexibility-driven' case, where the 
generic objects are called for).

Note that I had never mentioned this in public, because I think it may be a 
slight specialization that isn't needed early on, and currently the library's 
priority was to get off the ground.  But having such objects could be very 
handy, and now that the C API is starting to stabilize, maybe someone can play 
with this as a side project.  Once they prove their worth, these beasts could 
be folded as part of the official distribution.

I am not really qualified to judge whether there are enough areas for 
optimization where the sacrifices indicated could really pay off, both in 
terms of memory and performance.

Cheers,

f