[Numpy-discussion] C-coded dot 1000x faster than numpy?
Roman Yurchak
rth.yurchak at gmail.com
Tue Feb 23 13:40:22 EST 2021
For the first benchmark apparently A.dot(B) with A real and B complex is
a known issue performance wise https://github.com/numpy/numpy/issues/10468
In general, it might be worth trying different BLAS backends. For
instance, if you install numpy from conda-forge you should be able to
switch between OpenBLAS, MKL and BLIS:
https://conda-forge.org/docs/maintainer/knowledge_base.html#switching-blas-implementation
Roman
On 23/02/2021 19:32, Andrea Gavana wrote:
> Hi,
>
> On Tue, 23 Feb 2021 at 19.11, Neal Becker <ndbecker2 at gmail.com
> <mailto:ndbecker2 at gmail.com>> wrote:
>
> I have code that performs dot product of a 2D matrix of size (on the
> order of) [1000,16] with a vector of size [1000]. The matrix is
> float64 and the vector is complex128. I was using numpy.dot but it
> turned out to be a bottleneck.
>
> So I coded dot2x1 in c++ (using xtensor-python just for the
> interface). No fancy simd was used, unless g++ did it on it's own.
>
> On a simple benchmark using timeit I find my hand-coded routine is on
> the order of 1000x faster than numpy? Here is the test code:
> My custom c++ code is dot2x1. I'm not copying it here because it has
> some dependencies. Any idea what is going on?
>
>
>
> I had a similar experience - albeit with an older numpy and Python 2.7,
> so my comments are easily outdated and irrelevant. This was on Windows
> 10 64 bit, way more than plenty RAM.
>
> It took me forever to find out that numpy.dot was the culprit, and I
> ended up using fortran + f2py. Even with the overhead of having to go
> through f2py bridge, the fortran dot_product was several times faster.
>
> Sorry if It doesn’t help much.
>
> Andrea.
>
>
>
>
> import numpy as np
>
> from dot2x1 import dot2x1
>
> a = np.ones ((1000,16))
> b = np.array([ 0.80311816+0.80311816j, 0.80311816-0.80311816j,
> -0.80311816+0.80311816j, -0.80311816-0.80311816j,
> 1.09707981+0.29396165j, 1.09707981-0.29396165j,
> -1.09707981+0.29396165j, -1.09707981-0.29396165j,
> 0.29396165+1.09707981j, 0.29396165-1.09707981j,
> -0.29396165+1.09707981j, -0.29396165-1.09707981j,
> 0.25495815+0.25495815j, 0.25495815-0.25495815j,
> -0.25495815+0.25495815j, -0.25495815-0.25495815j])
>
> def F1():
> d = dot2x1 (a, b)
>
> def F2():
> d = np.dot (a, b)
>
> from timeit import timeit
> print (timeit ('F1()', globals=globals(), number=1000))
> print (timeit ('F2()', globals=globals(), number=1000))
>
> In [13]: 0.013910860987380147 << 1st timeit
> 28.608758996007964 << 2nd timeit
> --
> Those who don't understand recursion are doomed to repeat it
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion
> <https://mail.python.org/mailman/listinfo/numpy-discussion>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list