[Numpy-discussion] Numpy x Matlab: some synthetic benchmarks

Wed Jan 18 14:42:06 EST 2006

Fernando Perez <Fernando.Perez at colorado.edu> writes:

> Travis Oliphant wrote:
>> Andrew Straw wrote:
>>
>>> Here's an idea Fernando and I have briefly talked about off-list,
>>> but which perhaps bears talking about here: Is there speed to be
>>> gained by an alternative, very simple, very optimized ndarray
>>> constructor? The idea would be a special-case constructor with very
>>> limited functionality designed purely for speed. It wouldn't
>>> support (m)any of the fantastic things Travis has done, but would
>>> be useful only in specialized use cases, such as creating indices.
>> The general purpose constructor is
>> PyArray_NewFromDescr(...)
>> I suspect this could be special cased for certain circumstances and
>> the special-case called occasionally. Their are checks on the
>> dimensions that could be avoided in certain circumstances (like when
>> we are getting the dimensions from another arrayobject already...)
>> We could also inline the __array_from_strides code...
>> Besides that, I'm not sure what else to optimize...
>
> Just to give some context: this came to my mind inspired by Blitz++'s
> TinyVector and TinyMatrix objects.  In Blitz, arrays have compile-time
> rank, but run-time size in all dimensions.  Since this introduces some
> overhead, Blitz offers also the Tiny* classes, which are compile-time
> fixed _both_ in rank and in size.  This allows a number of
> optimizations to be made on them, at the cost of some flexibility
> lost.  Some info on these guys:
>
> http://www.oonumerics.org/blitz/manual/blitz07.html
>
> What Andrew and I discussed was the idea of writing some object which
> would only support the most basic operations: element-wise arithmetic,
> slicing, linear algebra calls on them (matrix-matrix, matrix-vector
> and vector operations), and little else.  I'd be OK losing fancy
> indexing, byteswapping, memory-mapping, reshaping, and anything else
> which costs either:
>
> 1. initialization-time CPU cycles
> 2. memory footprint
> 3. runtime element access and arithmetic.
>
> Such objects could be very useful in many contexts.  I'd even like an
> immutable version, so they could be used as dictionary keys without
> having to make a tuple out of them.  This would allow algorithms which
> use small arrays as multidimensional indices in sparse tree structures
> to be used without the hoops one must jump through today, and with
> higher performance.

I've done a little bit of work along these lines. I have a module I
call vector3 [*] which has 2- and 3-dimensional immutable vectors,
using either ints or doubles. It's as fast as I could make it, while
keeping it all written in Pyrex. I find it very convenient for
anything vector-related. Konrad Hinsen has something similiar in the
development version of his ScientificPython package.

[*] http://arbutus.mcmaster.ca/dmc/software/vector3.html

Also, I've also done some playing around with a n-dimensional vector
type (restricted to doubles). My best attempts make it ~4-5x faster
than numpy (and 2x faster than Numeric) for vectors of dimension 10
on simple ops like + and *, 2x faster than numpy for dimension 1000,
and approaching 1x as you make the vectors larger. Indexing is about
3x faster than numpy, and 1.4x faster than Numeric. So that gives I
think some idea of the maximum speed-up possible.

I think the speedups mostly come from the utter lack of any
polymorphism: it handles vectors of doubles only, and only as
contiguous vectors (no strides).

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca