[Numpy-discussion] Perf regression with Pythran between Numpy 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f, ENH: implement NEP-35's `like=` argument)

PIERRE AUGIER pierre.augier at univ-grenoble-alpes.fr
Mon Mar 15 07:29:02 EDT 2021


----- Mail original -----
> De: "Juan Nunez-Iglesias" <jni at fastmail.com>
> À: "numpy-discussion" <numpy-discussion at python.org>
> Envoyé: Dimanche 14 Mars 2021 07:15:39
> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with
> Pythran

> Hi Pierre,
> 
> If you’re able to compile NumPy locally and you have reliable benchmarks, you
> can write a script that tests the runtime of your benchmark and reports it as a
> test pass/fail. You can then use “git bisect run” to automatically find the
> commit that caused the issue. That will help narrow down the discussion before
> it gets completely derailed a second time. 😂
> 
> [ https://lwn.net/Articles/317154/ | https://lwn.net/Articles/317154/ ]
> 
> Juan.

Thanks a lot for this advice Juan! I wasn't able to use Git but with `hg bisect` I managed to find that the first "bad" commit is

https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f   ENH: implement NEP-35's `like=` argument (gh-16935)

>From the point of view of my benchmark, this commit changes the behavior of arr.copy() (the resulting arrays do not give to the same performance). This makes sense because it is indeed about the array creation.

I haven't yet studied in details this commit (which is quite big and not simple) and I'm not sure I'm going to be able to understand it and in particular understand why it leads to such performance regression!

Cheers,

Pierre

> 
> 
> On 13 Mar 2021, at 10:34 am, PIERRE AUGIER
> <pierre.augier at univ-grenoble-alpes.fr> wrote:
> 
> 
> 
> 
> Hi,
> 
> I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy
> --force-reinstall` and I can reproduce the regression.
> 
> Good news, I was able to reproduce the difference with only Numpy 1.20.1.
> 
> Arrays prepared with (`df` is a Pandas dataframe)
> 
> arr = df.values.copy()
> 
> or
> 
> arr = np.ascontiguousarray(df.values)
> 
> lead to "slow" execution while arrays prepared with
> 
> arr = np.copy(df.values)
> 
> lead to faster execution.
> 
> arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a
> Pandas dataframe with arr = df.values. It's strange because type(df.values)
> gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to
> give exactly the same result.
> 
> Note that I think I'm doing quite serious and reproducible benchmarks. I also
> checked that this regression is reproducible on another computer.
> 
> Cheers,
> 
> Pierre
> 
> ----- Mail original -----
> 
> 
> De: "Sebastian Berg" <sebastian at sipsolutions.net>
> 
> 
> À: "numpy-discussion" <numpy-discussion at python.org>
> 
> 
> Envoyé: Vendredi 12 Mars 2021 22:50:24
> 
> 
> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and
> 0.20 explaining a perf regression with
> 
> 
> Pythran
> 
> 
> 
> On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> 
> 
> 
> 
> Hi,
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> 
> 
> 
> 
> could explain a performance regression (~15 %) with Pythran.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I observe this regression with the script
> 
> 
> 
> 
> https://github.com/paugier/nbabel/blob/master/py/bench.py
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Pythran reimplements Numpy so it is not about Numpy code for
> 
> 
> 
> 
> computation. However, Pythran of course uses the native array
> 
> 
> 
> 
> contained in a Numpy array. I'm quite sure that something has changed
> 
> 
> 
> 
> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
> 
> 
> 
> 
> since I don't get the same performance with Numpy 0.20. I checked
> 
> 
> 
> 
> that the values in the arrays are the same and that the flags
> 
> 
> 
> 
> characterizing the arrays are also the same.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Good news, I'm now able to obtain the performance difference just
> 
> 
> 
> 
> with Numpy 0.19.5. In this code, I load the data with Pandas and need
> 
> 
> 
> 
> to prepare contiguous Numpy arrays to give them to Pythran. With
> 
> 
> 
> 
> Numpy 0.19.5, if I use np.copy I get better performance that with
> 
> 
> 
> 
> np.ascontiguousarray. With Numpy 0.20, both functions create array
> 
> 
> 
> 
> giving the same performance with Pythran (again, less good that with
> 
> 
> 
> 
> Numpy 0.19.5).
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Note that this code is very efficient (more that 100 times faster
> 
> 
> 
> 
> than using Numpy), so I guess that things like alignment or memory
> 
> 
> 
> 
> location can lead to such difference.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> More details in this issue
> 
> 
> 
> 
> https://github.com/serge-sans-paille/pythran/issues/1735
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Any help to understand what has changed would be greatly appreciated!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> If you want to really dig into this, it would be good to do profiling
> 
> 
> to find out at where the differences are.
> 
> 
> 
> 
> 
> Without that, I don't have much appetite to investigate personally. The
> 
> 
> reason is that fluctuations of ~30% (or even much more) when running
> 
> 
> the NumPy benchmarks are very common.
> 
> 
> 
> 
> 
> I am not aware of an immediate change in NumPy, especially since you
> 
> 
> are talking pythran, and only the memory space or the interface code
> 
> 
> should matter.
> 
> 
> As to the interface code... I would expect it to be quite a bit faster,
> 
> 
> not slower.
> 
> 
> There was no change around data allocation, so at best what you are
> 
> 
> seeing is a different pattern in how the "small array cache" ends up
> 
> 
> being used.
> 
> 
> 
> 
> 
> 
> 
> 
> Unfortunately, getting stable benchmarks that reflect code changes
> 
> 
> exactly is tough... Here is a nice blog post from Victor Stinner where
> 
> 
> he had to go as far as using "profile guided compilation" to avoid
> 
> 
> fluctuations:
> 
> 
> 
> 
> 
> https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> 
> 
> 
> 
> 
> I somewhat hope that this is also the reason for the huge fluctuations
> 
> 
> we see in the NumPy benchmarks due to absolutely unrelated code
> 
> 
> changes.
> 
> 
> But I did not have the energy to try it (and a probably fixed bug in
> 
> 
> gcc makes it a bit harder right now).
> 
> 
> 
> 
> 
> Cheers,
> 
> 
> 
> 
> 
> Sebastian
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Cheers,
> 
> 
> 
> 
> Pierre
> 
> 
> 
> 
> _______________________________________________
> 
> 
> 
> 
> NumPy-Discussion mailing list
> 
> 
> 
> 
> NumPy-Discussion at python.org
> 
> 
> 
> 
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> 
> 
> NumPy-Discussion mailing list
> 
> 
> NumPy-Discussion at python.org
> 
> 
> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


More information about the NumPy-Discussion mailing list