Francesc Alted wrote:
... Yeah, I think taking slices here is taking quite a lot of time:
In [58]: timeit E + Xi2[P/2,:] 100000 loops, best of 3: 3.95 µs per loop
In [59]: timeit E + Xi2[P/2] 100000 loops, best of 3: 2.17 µs per loop
don't know why the additional ',:' in the slice is taking so much time, but my guess is that passing & analyzing the second argument (slice(None,None,None)) could be the responsible for the slowdown (but that is taking too much time). Mmh, perhaps it would be worth to study this more carefully so that an optimization could be done in NumPy.
This is indeed interesting! And very nice that this actually works the way you'd expect it to. I guess I've just worked too long with Matlab :)
I think the lesson mostly should be that with so little data, benchmarking becomes a very difficult art.
Well, I think it is not difficult, it is just that you are perhaps benchmarking Python/NumPy machinery instead ;-) I'm curious whether Matlab can do slicing much more faster than NumPy. Jasper?
I had a look, these are the timings for Python for 60x20: Dot product: 0.051165 (5.116467e-06 per iter) Add a row: 0.092849 (9.284860e-06 per iter) Add a column: 0.082523 (8.252348e-06 per iter) For Matlab 60x20: Dot product: 0.029927 (2.992664e-006 per iter) Add a row: 0.019664 (1.966444e-006 per iter) Add a column: 0.008384 (8.384376e-007 per iter) For Python 600x200: Dot product: 1.917235 (1.917235e-04 per iter) Add a row: 0.113243 (1.132425e-05 per iter) Add a column: 0.162740 (1.627397e-05 per iter) For Matlab 600x200: Dot product: 1.282778 (1.282778e-004 per iter) Add a row: 0.107252 (1.072525e-005 per iter) Add a column: 0.021325 (2.132527e-006 per iter) If I fit a line through these two data points (60 and 600 rows), I get the following equations: Python, AR: 3.8e-5 * n + 0.091 Matlab, AC: 2.4e-5 * n + 0.0069 This would suggest that Matlab performs the vector addition about 1.6 times faster and has a 13 times smaller constant cost! As for the questions about what I'm trying to compute, these tests are minimized as much as possible to show the bottleneck I encountered, they are part of a larger loop where it does make sense. In essence I'm iteratively adjusting w and E has to keep up (because that's what is used to determine the next change). Instead of recomputing E all the time based on E = Xi*w a little linear algebra shows that the vector addition is sufficient.