[New-bugs-announce] [issue38980] Compile libpython with -fno-semantic-interposition

STINNER Victor report at bugs.python.org
Thu Dec 5 11:00:31 EST 2019


New submission from STINNER Victor <vstinner at python.org>:

The Fedora packaging has been modified to compile libpython with -fno-semantic-interposition flag: it makes Python up to 1.3x faster without having to touch any line of the C code! See pyperformance results:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup#Benefit_to_Fedora

The main drawback is that -fno-semantic-interposition prevents to override Python symbols using a custom library preloaded by LD_PRELOAD. For example, override PyErr_Occurred() function.

We (authors of the Fedora change) failed to find any use case for LD_PRELOAD.

To be honest, I found *one* user in the last 10 years who used LD_PRELOAD to track memory allocations in Python 2.7. This use case is no longer relevant in Python 3 with PEP 445 which provides a supported C API to override Python memory allocators or to install hooks on Python memory allocators. Moreover, tracemalloc is a nice way to track memory allocations.

Is there anyone aware of any special use of LD_PRELOAD for libpython?

To be clear: -fno-semantic-interposition only impacts libpython. All other libraries still respect LD_PRELOAD. For example, it is still possible to override glibc malloc/free.

Why -fno-semantic-interposition makes Python faster? There are multiple reasons. For of all, libpython makes a lot of function calls to libpython. Like really a lot, especially in the hot code paths. Without -fno-semantic-interposition, function calls to libpython requires to get through "interposition": for example "Procedure Linkage Table" (PLT) indirection on Linux. It prevents function inlining which has a major impact on performance (missed optimization). In short, even with PGO and LTO, libpython function calls have two performance "penalities":

* indirect function calls (PLT)
* no inlining

I'm comparing Python performance of "statically linked Python" (Debian/Ubuntu choice: don't use ./configure --enable-shared, python is not linked to libpython) to "dynamically linked Python" (Fedora choice: use "./configure --enable-shared", python is dynamically linked to libpython).

With -fno-semantic-interposition, function calls are direct and can be inlined when appropriate. You don't have to trust me, look at pyperformance benchmark results ;-)

When using ./configure --enable-shared (libpython), the "python" binary is exactly one function call and that's all:

int main(int argc, char **argv)
{ return Py_BytesMain(argc, argv); }

So 100% of the time is only spent in libpython.

For a longer rationale, see the accepted Fedora change:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup

----------
components: Build
messages: 357856
nosy: inada.naoki, pablogsal, serhiy.storchaka, vstinner
priority: normal
severity: normal
status: open
title: Compile libpython with -fno-semantic-interposition
type: performance
versions: Python 3.9

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue38980>
_______________________________________


More information about the New-bugs-announce mailing list