[Numpy-discussion] low level optimization in NumPy and minivect

Mon Jun 17 17:29:09 EDT 2013

On 06/17/2013 11:03 PM, Julian Taylor wrote:
> On 17.06.2013 17:11, Frédéric Bastien wrote:
>> Hi,
>>
>> I saw that recently Julian Taylor is doing many low level optimization
>> like using SSE instruction. I think it is great.
>>
>> Last year, Mark Florisson released the minivect[1] project that he
>> worked on during is master thesis. minivect is a compiler for
>> element-wise expression that do some of the same low level optimization
>> that Julian is doing in NumPy right now.
>>
>> Mark did minivect in a way that allow it to be reused by other project.
>> It is used now by Cython and Numba I think. I had plan to reuse it in
>> Theano, but I didn't got the time to integrate it up to now.
>>
>> What about reusing it in NumPy? I think that some of Julian optimization
>> aren't in minivect (I didn't check to confirm). But from I heard,
>> minivect don't implement reduction and there is a pull request to
>> optimize this in NumPy.
>
> Hi,
> what I vectorized is just the really easy cases of unit stride
> continuous operations, so the min/max reductions which is now in numpy
> is in essence pretty trivial.
> minivect goes much further in optimizing general strided access and
> broadcasting via loop optimizations (it seems to have a lot of overlap
> with the graphite loop optimizer available in GCC [0]) so my code is
> probably not of very much use to minivect.
>
> The most interesting part in minivect for numpy is probably the
> optimization of broadcasting loops which seem to be pretty inefficient
> in numpy [0].

There's also related things like

arr + arr.T

which has much less than optimal performance in NumPy (unless there was 
recent changes). This example was one of the motivating examples for 
minivect.

Dag Sverre