Benchmarks why we need PEP 576/579/580
Hello, I finally managed to get some real-life benchmarks for why we need a faster C calling protocol (see PEPs 576, 579, 580). I focused on the Cython compilation of SageMath. By default, a function in Cython is an instance of builtin_function_or_method (analogously, method_descriptor for a method), which has special optimizations in the CPython interpreter. But the option "binding=True" changes those to a custom class which is NOT optimized. I ran the full SageMath testsuite several times without and with binding=True to find out any significant differences. The most dramatic difference is multiplication for generic matrices. More precisely, with the following command: python -m timeit -s "from sage.all import MatrixSpace, GF; M = MatrixSpace(GF(9), 200).random_element()" "M * M" With binding=False, I got 10 loops, best of 3: 692 msec per loop With binding=True, I got 10 loops, best of 3: 1.16 sec per loop This is a big regression which should be gone completely with PEP 580. I should mention that this was done on Python 2.7.15 (SageMath is not yet ported to Python 3) but I see no reason why the conclusions shouldn't be valid for newer Python versions. I used SageMath 8.3.rc1 and Cython 0.28.4. I hope that this finally shows that the problems mentioned in PEP 579 are real. Jeroen.
On Sun, Jul 22, 2018 at 1:28 AM Jeroen Demeyer
Hello,
I finally managed to get some real-life benchmarks for why we need a faster C calling protocol (see PEPs 576, 579, 580).
Good job. But I already +1 for adding support for extension callable type. Do you think this benchmark can be optimized more in future optimization which is possible by PEP 580, but not 576?
I should mention that this was done on Python 2.7.15 (SageMath is not yet ported to Python 3) but I see no reason why the conclusions shouldn't be valid for newer Python versions. I used SageMath 8.3.rc1 and Cython 0.28.4.
Do you mean you backport LOAD_METHOD and fastcall to Python 2.7
for benchmarking?
Reproducing it seems hard job to me...
--
INADA Naoki
I don’t think we can safely assume Python 3.7 has the same performance,
actually. A lot has changed.
On Sat, Jul 21, 2018 at 10:10 AM INADA Naoki
On Sun, Jul 22, 2018 at 1:28 AM Jeroen Demeyer
wrote: Hello,
I finally managed to get some real-life benchmarks for why we need a faster C calling protocol (see PEPs 576, 579, 580).
Good job. But I already +1 for adding support for extension callable type. Do you think this benchmark can be optimized more in future optimization which is possible by PEP 580, but not 576?
I should mention that this was done on Python 2.7.15 (SageMath is not yet ported to Python 3) but I see no reason why the conclusions shouldn't be valid for newer Python versions. I used SageMath 8.3.rc1 and Cython 0.28.4.
Do you mean you backport LOAD_METHOD and fastcall to Python 2.7 for benchmarking? Reproducing it seems hard job to me...
-- INADA Naoki
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido (mobile)
On Sat, Jul 21, 2018 at 6:30 PM Jeroen Demeyer
Hello,
I finally managed to get some real-life benchmarks for why we need a faster C calling protocol (see PEPs 576, 579, 580).
I focused on the Cython compilation of SageMath. By default, a function in Cython is an instance of builtin_function_or_method (analogously, method_descriptor for a method), which has special optimizations in the CPython interpreter. But the option "binding=True" changes those to a custom class which is NOT optimized.
I ran the full SageMath testsuite several times without and with binding=True to find out any significant differences. The most dramatic difference is multiplication for generic matrices. More precisely, with the following command:
python -m timeit -s "from sage.all import MatrixSpace, GF; M = MatrixSpace(GF(9), 200).random_element()" "M * M"
With binding=False, I got 10 loops, best of 3: 692 msec per loop
With binding=True, I got 10 loops, best of 3: 1.16 sec per loop
This is a big regression which should be gone completely with PEP 580.
I should mention that this was done on Python 2.7.15 (SageMath is not yet ported to Python 3) but I see no reason why the conclusions shouldn't be valid for newer Python versions. I used SageMath 8.3.rc1 and Cython 0.28.4.
I haven't fully caught up on the thread yet so this might already be a moot point. But just in case it isn't, the Python 3 port of Sage works well enough (at least on my branch) that the above benchmark works, and would probably be worth repeating there (it's currently Python 3.6.1, but upgrading to 3.7 probably wouldn't break the example benchmark either).
participants (4)
-
Erik Bray
-
Guido van Rossum
-
INADA Naoki
-
Jeroen Demeyer