Travis Oliphant wrote:
Andrew Straw wrote:
Here's an idea Fernando and I have briefly talked about off-list, but which perhaps bears talking about here: Is there speed to be gained by an alternative, very simple, very optimized ndarray constructor? The idea would be a special-case constructor with very limited functionality designed purely for speed. It wouldn't support (m)any of the fantastic things Travis has done, but would be useful only in specialized use cases, such as creating indices.
The general purpose constructor is
I suspect this could be special cased for certain circumstances and the special-case called occasionally. Their are checks on the dimensions that could be avoided in certain circumstances (like when we are getting the dimensions from another arrayobject already...)
We could also inline the __array_from_strides code...
Besides that, I'm not sure what else to optimize...
Just to give some context: this came to my mind inspired by Blitz++'s TinyVector and TinyMatrix objects. In Blitz, arrays have compile-time rank, but run-time size in all dimensions. Since this introduces some overhead, Blitz offers also the Tiny* classes, which are compile-time fixed _both_ in rank and in size. This allows a number of optimizations to be made on them, at the cost of some flexibility lost. Some info on these guys:
What Andrew and I discussed was the idea of writing some object which would only support the most basic operations: element-wise arithmetic, slicing, linear algebra calls on them (matrix-matrix, matrix-vector and vector operations), and little else. I'd be OK losing fancy indexing, byteswapping, memory-mapping, reshaping, and anything else which costs either:
1. initialization-time CPU cycles 2. memory footprint 3. runtime element access and arithmetic.
Such objects could be very useful in many contexts. I'd even like an immutable version, so they could be used as dictionary keys without having to make a tuple out of them. This would allow algorithms which use small arrays as multidimensional indices in sparse tree structures to be used without the hoops one must jump through today, and with higher performance.
I wonder if giving up reshaping would allow the indexing code to be faster, as specialized versions could be hard-coded for each rank, with only say ranks 1-4 offered for this kind of object (I know we recently had a discussion about large ranks, but this object would be geared towards pure performance, and certainly working in 32 dimensions is a 'flexibility-driven' case, where the generic objects are called for).
Note that I had never mentioned this in public, because I think it may be a slight specialization that isn't needed early on, and currently the library's priority was to get off the ground. But having such objects could be very handy, and now that the C API is starting to stabilize, maybe someone can play with this as a side project. Once they prove their worth, these beasts could be folded as part of the official distribution.
I am not really qualified to judge whether there are enough areas for optimization where the sacrifices indicated could really pay off, both in terms of memory and performance.