Fernando Perez Fernando.Perez@colorado.edu writes:
Travis Oliphant wrote:
Andrew Straw wrote:
Here's an idea Fernando and I have briefly talked about off-list, but which perhaps bears talking about here: Is there speed to be gained by an alternative, very simple, very optimized ndarray constructor? The idea would be a special-case constructor with very limited functionality designed purely for speed. It wouldn't support (m)any of the fantastic things Travis has done, but would be useful only in specialized use cases, such as creating indices.
The general purpose constructor is PyArray_NewFromDescr(...) I suspect this could be special cased for certain circumstances and the special-case called occasionally. Their are checks on the dimensions that could be avoided in certain circumstances (like when we are getting the dimensions from another arrayobject already...) We could also inline the __array_from_strides code... Besides that, I'm not sure what else to optimize...
Just to give some context: this came to my mind inspired by Blitz++'s TinyVector and TinyMatrix objects. In Blitz, arrays have compile-time rank, but run-time size in all dimensions. Since this introduces some overhead, Blitz offers also the Tiny* classes, which are compile-time fixed _both_ in rank and in size. This allows a number of optimizations to be made on them, at the cost of some flexibility lost. Some info on these guys:
What Andrew and I discussed was the idea of writing some object which would only support the most basic operations: element-wise arithmetic, slicing, linear algebra calls on them (matrix-matrix, matrix-vector and vector operations), and little else. I'd be OK losing fancy indexing, byteswapping, memory-mapping, reshaping, and anything else which costs either:
- initialization-time CPU cycles
- memory footprint
- runtime element access and arithmetic.
Such objects could be very useful in many contexts. I'd even like an immutable version, so they could be used as dictionary keys without having to make a tuple out of them. This would allow algorithms which use small arrays as multidimensional indices in sparse tree structures to be used without the hoops one must jump through today, and with higher performance.
I've done a little bit of work along these lines. I have a module I call vector3 [*] which has 2- and 3-dimensional immutable vectors, using either ints or doubles. It's as fast as I could make it, while keeping it all written in Pyrex. I find it very convenient for anything vector-related. Konrad Hinsen has something similiar in the development version of his ScientificPython package.
Also, I've also done some playing around with a n-dimensional vector type (restricted to doubles). My best attempts make it ~4-5x faster than numpy (and 2x faster than Numeric) for vectors of dimension 10 on simple ops like + and *, 2x faster than numpy for dimension 1000, and approaching 1x as you make the vectors larger. Indexing is about 3x faster than numpy, and 1.4x faster than Numeric. So that gives I think some idea of the maximum speed-up possible.
I think the speedups mostly come from the utter lack of any polymorphism: it handles vectors of doubles only, and only as contiguous vectors (no strides).