[New-bugs-announce] [issue45116] Performance regression 3.10b1 and later on Windows

neonene report at bugs.python.org
Mon Sep 6 11:27:18 EDT 2021


New submission from neonene <nicesalmon at gmail.com>:

pyperformance on Windows shows some gap between 3.10a7 and 3.10b1.
The following are the ratios compared with 3.10a7 (the higher the slower).

-------------------------------------------------
Windows x64     |  PGO   release  official-binary
----------------+--------------------------------
20210405        |
    3.10a7      |  1.00   1.24    1.00 (PGO?)
20210408-07:58  |
    b98eba5     |  0.98
20210408-10:22  |
  * PR25244     |  1.04
20210503        |
    3.10b1      |  1.07   1.21    1.07
-------------------------------------------------
Windows x86     |  PGO   release  official-binary
----------------+--------------------------------
20210405        |
    3.10a7      |  1.00   1.25    1.27 (release?)
20210408-07:58  |
    b98eba5bc   |  1.00
20210408-10:22  |
  * PR25244     |  1.11
20210503        |
    3.10b1      |  1.14   1.28    1.29

Since PR25244 (28d28e053db6b69d91c2dfd579207cd8ccbc39e7),
_PyEval_EvalFrameDefault() in ceval.c has seemed to be unoptimized with PGO (msvc14.29.16.10).
At least the functions below have become un-inlined there at all.

  (1) _Py_DECREF()                 (from Py_DECREF,Py_CLEAR,Py_SETREF)
  (2) _Py_XDECREF()                (from Py_XDECREF,SETLOCAL)
  (3) _Py_IS_TYPE()                (from PyXXX_CheckExact)
  (4) _Py_atomic_load_32bit_impl() (from CHECK_EVAL_BREAKER)

I tried in vain other linker options like thread-safe-profiling, agressive-code-generation, /OPT:NOREF.
3.10a7 can inline them in the eval-loop even if profiling only test_array.py.

I measured overheads of (1)~(4) on my own build whose eval-loop uses macros instead of them.

-----------------------------------------------------------------
Windows x64     |  PGO   patched  overhead in eval-loop
----------------+------------------------------------------------
    3.10a7      |  1.00
20210802        |
    3.10rc1     |  1.09   1.05    4%  (slow 43, fast  5, same 10)
20210831-20:42  |
    863154c     |  0.95   0.90    5%  (slow 48, fast  3, same  7)
   (3.11a0+)    |
-----------------------------------------------------------------
Windows x86     |  PGO   patched  overhead in eval-loop
----------------+------------------------------------------------
    3.10a7      |  1.00
20210802        |
    3.10rc1     |  1.15   1.13    2%  (slow 29, fast 14, same 15)
20210831-20:42  |
    863154c     |  1.05   1.02    3%  (slow 44, fast  7, same  7)
   (3.11a0+)    |

----------
components: C API, Interpreter Core, Windows
files: 310rc1_confirm_overhead.patch
keywords: patch
messages: 401143
nosy: Mark.Shannon, neonene, pablogsal, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
priority: normal
severity: normal
status: open
title: Performance regression 3.10b1 and later on Windows
type: performance
versions: Python 3.10, Python 3.11
Added file: https://bugs.python.org/file50263/310rc1_confirm_overhead.patch

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45116>
_______________________________________


More information about the New-bugs-announce mailing list