After much testing I found what is causing the regression in 16.04 and later.  There are several distinct causes which are attributed to the choices made in debian/rules and the changes in GCC.

Cause #1: the decision to compile `Modules/_math.c` with `-fPIC` *and* link it statically into the python executable [1].  This causes the majority of the slowdown.  This may be a bug in GCC or simply a constraint, I didn't find anything specific on this topic, although there are a lot of old bug reports regarding the interaction of -fPIC with -flto.

Cause #2: the enablement of `fpectl` [2], specifically the passage of `--with-fpectl` to `configure`.  fpectl is disabled in builds by default and its use is discouraged.  Yet, Debian builds enable it unconditionally, and it seems to cause a significant performance degradation.  It's much less noticeable on 14.04 with GCC 4.8.0, but on more recent releases the performance difference seems to be larger.

Plausible Cause #3: stronger stack smashing protection in 16.04, which uses --fstack-protector-strong, whereas 14.04 and earlier used --fstack-protector (with lesser performance overhead).

Also, debian/rules limits the scope of PGO's PROFILE_TASK to 377 test suites vs upstream's 397, which affects performance somewhat negatively, but this is not definitive.  What are the reasons behind the trimming of the tests used for PGO?

Without fpectl, and without -fPIC on _math.c, 2.7.12 built on 16.04 is slower than stock 2.7.6 on 14.04 by about 0.9% in my pyperformance runs [3].  This is in contrast to a whopping 7.95% slowdown when comparing stock versions.

Finally, a vanilla Python 2.7.12 build using GCC 5.4.0, default CFLAGS, default PROFILE_TASK and default Modules/Setup.local consistently runs faster in benchmarks than 2.7.6 (by about 0.7%), but I was not able to pinpoint the exact reason for this difference.

Note: the percentages above are the relative change in the geometric mean of pyperformance benchmark results.





On Fri, Mar 3, 2017 at 10:27 AM, Louis Bouchard <> wrote:

Le 03/03/2017 à 15:37, Louis Bouchard a écrit :
> Hello,
> Le 03/03/2017 à 15:31, Victor Stinner a écrit :
>>> Out of curiosity, I ran the set of benchmarks in two LXC containers running
>>> centos7 (2.7.5 + gcc 4.8.5) and Fedora 25 (2.7.13 + gcc 6.3.x). The benchmarks
>>> do run faster in 18 benchmarks, slower on 12 and insignificant for the rest (~33
>>> from memory).
>> "faster" or "slower" is relative: I would like to see the ?.??x
>> faster/slower or percent value. Can you please share the result? I
>> don't know what is the best output:
>>   python3 -m performance compare centos.json fedora.json
>> or the new:
>>   python3 -m perf compare_to centos.json fedora.json --table --quiet
>> Victor
> All the results, including the latest are in the spreadsheet here (cited in the
> analysis document) :
> Third column is the ?.??x value that you are looking for, taken directly out of
> the 'pyperformance analyze' results.
> I didn't know about the new options, I'll give it a spin & see if I can get a
> better format.

All the benchmark data using the new format have been uploaded to the
spreadsheet. Each sheet is prefixed with pct_.


Kind regards,


Louis Bouchard
Software engineer, Cloud & Sustaining eng.
Canonical Ltd
Ubuntu developer                       Debian Maintainer
GPG : 429D 7A3B DD05 B6F8 AF63  B9C4 8B3D 867C 823E 7A61
Python-Dev mailing list