[Numpy-discussion] Perf regression with Pythran between Numpy 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f, ENH: implement NEP-35's `like=` argument)

Mon Mar 15 11:11:20 EDT 2021

On Mon, 2021-03-15 at 14:59 +0100, Peter Andreas Entschev wrote:
> Hi Pierre,
> 
> Thanks for pinging me. To put it in the simplest way possible, that
> PR
> adds a new `like` kwarg that will dispatch to downstream libraries
> using `__array_function__` when specified, otherwise fallback to the
> default behavior of NumPy. While that introduces an extra check on
> the
> C side, that should have minimal impact for use cases that don't use
> the `like` kwarg.
> 
> Is there a simple reproducer with NumPy only? I assume your case with
> Pandas is much more complex (unfortunately I'm not very experienced
> with DataFrames), but curiously I see NumPy 1.20.1 being considerably
> faster for small arrays and mildly-faster with large arrays (results
> in
> https://gist.github.com/pentschev/add38b5aee61da87b4b70a1c4649861f)
> .

1.20.1 should have some small overhead reductions there since array the
array-object life-cycle is probably around 30% faster (deleting an
array is faster).  But the array-object life-cycle is pretty
insignificant aside from creating views.
There are also many performance improvements around SIMD, which will
affect certain math operations.

The changes on that PR may add additional overhead to array creation
(something that should go away again in 1.21 and end up being much
faster when https://github.com/numpy/numpy/pull/15270 goes in).  But
that is all.

As much as I would love to have an answer, looking for changes in the
NumPy code seems to me unlikely to get you anywhere. Another example,
check out this benchmark from the NumPy benchmarks:

https://pv.github.io/numpy-bench/index.html#bench_reduce.AddReduceSeparate.time_reduce?cpu=Intel(R)%20Core(TM)%20i7%20CPU%20920%20%40%202.67GHz&machine=i7&os=Linux&ram=16416652&Cython=0.29.21&p-axis=1&p-type='int16'&p-type='int32'

It keeps jumping back and forth around 30% for the 'int16' version, but
the 'int32' one is pretty much stable, so its unlikely to be just bad
benchmarking.

Right now, I am willing to get that if you repeat that whole thing with
a different commit range, you will find another random bad commit.

Cheers,

Sebastian

> 
> Best,
> Peter
> 
> 
> 
> On Mon, Mar 15, 2021 at 12:29 PM PIERRE AUGIER
> <pierre.augier at univ-grenoble-alpes.fr> wrote:
> > 
> > 
> > ----- Mail original -----
> > > De: "Juan Nunez-Iglesias" <jni at fastmail.com>
> > > À: "numpy-discussion" <numpy-discussion at python.org>
> > > Envoyé: Dimanche 14 Mars 2021 07:15:39
> > > Objet: Re: [Numpy-discussion] Looking for a difference between
> > > Numpy 0.19.5 and 0.20 explaining a perf regression with
> > > Pythran
> > 
> > > Hi Pierre,
> > > 
> > > If you’re able to compile NumPy locally and you have reliable
> > > benchmarks, you
> > > can write a script that tests the runtime of your benchmark and
> > > reports it as a
> > > test pass/fail. You can then use “git bisect run” to
> > > automatically find the
> > > commit that caused the issue. That will help narrow down the
> > > discussion before
> > > it gets completely derailed a second time.
> > > 
> > > [ https://lwn.net/Articles/317154/ | 
> > > https://lwn.net/Articles/317154/ ]
> > > 
> > > Juan.
> > 
> > Thanks a lot for this advice Juan! I wasn't able to use Git but
> > with `hg bisect` I managed to find that the first "bad" commit is
> > 
> > https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f
> >    ENH: implement NEP-35's `like=` argument (gh-16935)
> > 
> > From the point of view of my benchmark, this commit changes the
> > behavior of arr.copy() (the resulting arrays do not give to the
> > same performance). This makes sense because it is indeed about the
> > array creation.
> > 
> > I haven't yet studied in details this commit (which is quite big
> > and not simple) and I'm not sure I'm going to be able to understand
> > it and in particular understand why it leads to such performance
> > regression!
> > 
> > Cheers,
> > 
> > Pierre
> > 
> > > 
> > > 
> > > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER
> > > <pierre.augier at univ-grenoble-alpes.fr> wrote:
> > > 
> > > 
> > > 
> > > 
> > > Hi,
> > > 
> > > I tried to compile Numpy with `pip install numpy==1.20.1 --no-
> > > binary numpy
> > > --force-reinstall` and I can reproduce the regression.
> > > 
> > > Good news, I was able to reproduce the difference with only Numpy
> > > 1.20.1.
> > > 
> > > Arrays prepared with (`df` is a Pandas dataframe)
> > > 
> > > arr = df.values.copy()
> > > 
> > > or
> > > 
> > > arr = np.ascontiguousarray(df.values)
> > > 
> > > lead to "slow" execution while arrays prepared with
> > > 
> > > arr = np.copy(df.values)
> > > 
> > > lead to faster execution.
> > > 
> > > arr.copy() or np.copy(arr) do not give the same result, with arr
> > > obtained from a
> > > Pandas dataframe with arr = df.values. It's strange because
> > > type(df.values)
> > > gives <class 'numpy.ndarray'> so I would expect arr.copy() and
> > > np.copy(arr) to
> > > give exactly the same result.
> > > 
> > > Note that I think I'm doing quite serious and reproducible
> > > benchmarks. I also
> > > checked that this regression is reproducible on another computer.
> > > 
> > > Cheers,
> > > 
> > > Pierre
> > > 
> > > ----- Mail original -----
> > > 
> > > 
> > > De: "Sebastian Berg" <sebastian at sipsolutions.net>
> > > 
> > > 
> > > À: "numpy-discussion" <numpy-discussion at python.org>
> > > 
> > > 
> > > Envoyé: Vendredi 12 Mars 2021 22:50:24
> > > 
> > > 
> > > Objet: Re: [Numpy-discussion] Looking for a difference between
> > > Numpy 0.19.5 and
> > > 0.20 explaining a perf regression with
> > > 
> > > 
> > > Pythran
> > > 
> > > 
> > > 
> > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> > > 
> > > 
> > > 
> > > 
> > > Hi,
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> > > 
> > > 
> > > 
> > > 
> > > could explain a performance regression (~15 %) with Pythran.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > I observe this regression with the script
> > > 
> > > 
> > > 
> > > 
> > > https://github.com/paugier/nbabel/blob/master/py/bench.py
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Pythran reimplements Numpy so it is not about Numpy code for
> > > 
> > > 
> > > 
> > > 
> > > computation. However, Pythran of course uses the native array
> > > 
> > > 
> > > 
> > > 
> > > contained in a Numpy array. I'm quite sure that something has
> > > changed
> > > 
> > > 
> > > 
> > > 
> > > between Numpy 0.19.5 and 0.20 (or between the corresponding
> > > wheels?)
> > > 
> > > 
> > > 
> > > 
> > > since I don't get the same performance with Numpy 0.20. I checked
> > > 
> > > 
> > > 
> > > 
> > > that the values in the arrays are the same and that the flags
> > > 
> > > 
> > > 
> > > 
> > > characterizing the arrays are also the same.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Good news, I'm now able to obtain the performance difference just
> > > 
> > > 
> > > 
> > > 
> > > with Numpy 0.19.5. In this code, I load the data with Pandas and
> > > need
> > > 
> > > 
> > > 
> > > 
> > > to prepare contiguous Numpy arrays to give them to Pythran. With
> > > 
> > > 
> > > 
> > > 
> > > Numpy 0.19.5, if I use np.copy I get better performance that with
> > > 
> > > 
> > > 
> > > 
> > > np.ascontiguousarray. With Numpy 0.20, both functions create
> > > array
> > > 
> > > 
> > > 
> > > 
> > > giving the same performance with Pythran (again, less good that
> > > with
> > > 
> > > 
> > > 
> > > 
> > > Numpy 0.19.5).
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Note that this code is very efficient (more that 100 times faster
> > > 
> > > 
> > > 
> > > 
> > > than using Numpy), so I guess that things like alignment or
> > > memory
> > > 
> > > 
> > > 
> > > 
> > > location can lead to such difference.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > More details in this issue
> > > 
> > > 
> > > 
> > > 
> > > https://github.com/serge-sans-paille/pythran/issues/1735
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Any help to understand what has changed would be greatly
> > > appreciated!
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > If you want to really dig into this, it would be good to do
> > > profiling
> > > 
> > > 
> > > to find out at where the differences are.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Without that, I don't have much appetite to investigate
> > > personally. The
> > > 
> > > 
> > > reason is that fluctuations of ~30% (or even much more) when
> > > running
> > > 
> > > 
> > > the NumPy benchmarks are very common.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > I am not aware of an immediate change in NumPy, especially since
> > > you
> > > 
> > > 
> > > are talking pythran, and only the memory space or the interface
> > > code
> > > 
> > > 
> > > should matter.
> > > 
> > > 
> > > As to the interface code... I would expect it to be quite a bit
> > > faster,
> > > 
> > > 
> > > not slower.
> > > 
> > > 
> > > There was no change around data allocation, so at best what you
> > > are
> > > 
> > > 
> > > seeing is a different pattern in how the "small array cache" ends
> > > up
> > > 
> > > 
> > > being used.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Unfortunately, getting stable benchmarks that reflect code
> > > changes
> > > 
> > > 
> > > exactly is tough... Here is a nice blog post from Victor Stinner
> > > where
> > > 
> > > 
> > > he had to go as far as using "profile guided compilation" to
> > > avoid
> > > 
> > > 
> > > fluctuations:
> > > 
> > > 
> > > 
> > > 
> > > 
> > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> > > 
> > > 
> > > 
> > > 
> > > 
> > > I somewhat hope that this is also the reason for the huge
> > > fluctuations
> > > 
> > > 
> > > we see in the NumPy benchmarks due to absolutely unrelated code
> > > 
> > > 
> > > changes.
> > > 
> > > 
> > > But I did not have the energy to try it (and a probably fixed bug
> > > in
> > > 
> > > 
> > > gcc makes it a bit harder right now).
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Cheers,
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Sebastian
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Cheers,
> > > 
> > > 
> > > 
> > > 
> > > Pierre
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > 
> > > 
> > > 
> > > 
> > > NumPy-Discussion mailing list
> > > 
> > > 
> > > 
> > > 
> > > NumPy-Discussion at python.org
> > > 
> > > 
> > > 
> > > 
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > 
> > > 
> > > NumPy-Discussion mailing list
> > > 
> > > 
> > > NumPy-Discussion at python.org
> > > 
> > > 
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210315/46f68aef/attachment-0001.sig>