[Numpy-discussion] C-coded dot 1000x faster than numpy?

Carl Kleffner cmkleffner at gmail.com
Tue Feb 23 14:19:12 EST 2021


https://stackoverflow.com/questions/19839539/how-to-get-faster-code-than-numpy-dot-for-matrix-multiplication

maybe C_CONTIGUOUS vs F_CONTIGUOUS?

Carl


Am Di., 23. Feb. 2021 um 19:52 Uhr schrieb Neal Becker <ndbecker2 at gmail.com
>:

> One suspect is that it seems the numpy version was multi-threading.
> This isn't useful here, because I'm running parallel monte-carlo
> simulations using all cores.  Perhaps this is perversely slowing
> things down?  I don't know how to account for 1000x  slowdown though.
>
> On Tue, Feb 23, 2021 at 1:40 PM Roman Yurchak <rth.yurchak at gmail.com>
> wrote:
> >
> > For the first benchmark apparently A.dot(B) with A real and B complex is
> > a known issue performance wise
> https://github.com/numpy/numpy/issues/10468
> >
> > In general, it might be worth trying different BLAS backends. For
> > instance, if you install numpy from conda-forge you should be able to
> > switch between OpenBLAS, MKL and BLIS:
> >
> https://conda-forge.org/docs/maintainer/knowledge_base.html#switching-blas-implementation
> >
> > Roman
> >
> > On 23/02/2021 19:32, Andrea Gavana wrote:
> > > Hi,
> > >
> > > On Tue, 23 Feb 2021 at 19.11, Neal Becker <ndbecker2 at gmail.com
> > > <mailto:ndbecker2 at gmail.com>> wrote:
> > >
> > >     I have code that performs dot product of a 2D matrix of size (on
> the
> > >     order of) [1000,16] with a vector of size [1000].  The matrix is
> > >     float64 and the vector is complex128.  I was using numpy.dot but it
> > >     turned out to be a bottleneck.
> > >
> > >     So I coded dot2x1 in c++ (using xtensor-python just for the
> > >     interface).  No fancy simd was used, unless g++ did it on it's own.
> > >
> > >     On a simple benchmark using timeit I find my hand-coded routine is
> on
> > >     the order of 1000x faster than numpy?  Here is the test code:
> > >     My custom c++ code is dot2x1.  I'm not copying it here because it
> has
> > >     some dependencies.  Any idea what is going on?
> > >
> > >
> > >
> > > I had a similar experience - albeit with an older numpy and Python 2.7,
> > > so my comments are easily outdated and irrelevant. This was on Windows
> > > 10 64 bit, way more than plenty RAM.
> > >
> > > It took me forever to find out that numpy.dot was the culprit, and I
> > > ended up using fortran + f2py. Even with the overhead of having to go
> > > through f2py bridge, the fortran dot_product was several times faster.
> > >
> > > Sorry if It doesn’t help much.
> > >
> > > Andrea.
> > >
> > >
> > >
> > >
> > >     import numpy as np
> > >
> > >     from dot2x1 import dot2x1
> > >
> > >     a = np.ones ((1000,16))
> > >     b = np.array([ 0.80311816+0.80311816j,  0.80311816-0.80311816j,
> > >             -0.80311816+0.80311816j, -0.80311816-0.80311816j,
> > >              1.09707981+0.29396165j,  1.09707981-0.29396165j,
> > >             -1.09707981+0.29396165j, -1.09707981-0.29396165j,
> > >              0.29396165+1.09707981j,  0.29396165-1.09707981j,
> > >             -0.29396165+1.09707981j, -0.29396165-1.09707981j,
> > >              0.25495815+0.25495815j,  0.25495815-0.25495815j,
> > >             -0.25495815+0.25495815j, -0.25495815-0.25495815j])
> > >
> > >     def F1():
> > >          d = dot2x1 (a, b)
> > >
> > >     def F2():
> > >          d = np.dot (a, b)
> > >
> > >     from timeit import timeit
> > >     print (timeit ('F1()', globals=globals(), number=1000))
> > >     print (timeit ('F2()', globals=globals(), number=1000))
> > >
> > >     In [13]: 0.013910860987380147 << 1st timeit
> > >     28.608758996007964  << 2nd timeit
> > >     --
> > >     Those who don't understand recursion are doomed to repeat it
> > >     _______________________________________________
> > >     NumPy-Discussion mailing list
> > >     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> > >     https://mail.python.org/mailman/listinfo/numpy-discussion
> > >     <https://mail.python.org/mailman/listinfo/numpy-discussion>
> > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
> --
> Those who don't understand recursion are doomed to repeat it
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/04bf5363/attachment.html>


More information about the NumPy-Discussion mailing list