[Numpy-discussion] Proposed Roadmap Overview

Mon Feb 20 13:02:17 EST 2012

Den 20.02.2012 18:34, skrev Christopher Jordan-Squire:
> I don't follow this. Could you expand a bit more? (Specifically, I 
> wasn't aware that numpy could be 10-20x slower than a cython loop, if 
> we're talking about the base numpy library--so core operations. I'm 
> also not totally sure why a JIT is a 2x improvement or so vs. cython. 
> Not that a disagree on either of these points, I'd just like a bit 
> more detail.) 

Dag Sverre is right about this.

NumPy is memory bound, Cython loops are (usually) CPU bound.

If you write:

     x[:] = a + b + c  # numpy arrays

then this happens (excluding reference counting):

- allocate temporary array
- loop over a and b, add to temporary
- allocate 2nd temporary array
- loop over 1st temporary array  and c, add to 2nd
- deallocate 1st temporary array
- loop over 2nd temporary array, assign to x
- deallocate 2nd temporary array

Since memory access is slow, memory allocation and deallocation
is slow, and computation is fast, this will be perhaps 10 times
slower than what we could do with a loop in Cython:

     for i in range(n):
         x[i] = a[i] + b[i] + c[i]

I.e. we get rid of the temporary arrays and the multiple loops.
All the temporaries here are put in registers.

It is streaming data into the CPU that is slow, not computing!

It has actually been experimented with streaming data in a
compressed form, and decompressing on the fly, as data access
still dominates the runtime (even if you do a lot of computing
per element).

Sturla