Mailman 3 A fast startup patch (was: Python startup time) - Python-Dev

3 May 2018

      Hello,

Yesterday Neil Schemenauer mentioned some work that a colleague of mine
(CCed) and I have done to improve CPython start-up time.  Given the recent
discussion, it seems timely to discuss what we are doing and whether it is
of interest to other people hacking on the CPython runtime.

There are many ways to reduce the start-up time overhead.  For this
experiment, we are specifically targeting the cost of unmarshaling heap
objects from compiled Python bytecode.  Our measurements show this specific
cost to represent 10% to 25% of the start-up time among the applications we
have examined.

Our approach to eliminating this overhead is to store unmarshaled objects
into the data segment of the python executable.  We do this by processing
the compiled python bytecode for a module, creating native object code with
the unmarshaled objects in their in-memory representation, and linking this
into the python executable.

When a module is imported, we simply return a pointer to the top-level code
object in the data segment directly without invoking the unmarshaling code
or touching the file system.  What we are doing is conceptually similar to
the existing capability to freeze a module, but we avoid non-trivial
unmarshaling costs.

The patch is still under development and there is still a little bit more
work to do.  With that in mind, the numbers look good but please take these
with a grain of salt

Baseline

$ bench "./python.exe -c ''"
benchmarking ./python.exe -c ''
time                 31.46 ms   (31.24 ms .. 31.78 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 32.08 ms   (31.82 ms .. 32.63 ms)
std dev              778.1 μs   (365.6 μs .. 1.389 ms)

$ bench "./python.exe -c 'import difflib'"
benchmarking ./python.exe -c 'import difflib'
time                 32.82 ms   (32.64 ms .. 33.02 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 33.17 ms   (33.01 ms .. 33.44 ms)
std dev              430.7 μs   (233.8 μs .. 675.4 μs)

With our patch

$ bench "./python.exe -c ''"
benchmarking ./python.exe -c ''
time                 24.86 ms   (24.62 ms .. 25.08 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 25.58 ms   (25.36 ms .. 25.94 ms)
std dev              592.8 μs   (376.2 μs .. 907.8 μs)

$ bench "./python.exe -c 'import difflib'"
benchmarking ./python.exe -c 'import difflib'
time                 25.30 ms   (25.00 ms .. 25.55 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 26.78 ms   (26.30 ms .. 27.64 ms)
std dev              1.413 ms   (747.5 μs .. 2.250 ms)
variance introduced by outliers: 20% (moderately inflated)

Here are some numbers with the patch but with the stat calls preserved to
isolate just the marshaling effects

Baseline

$ bench "./python.exe -c 'import difflib'"
benchmarking ./python.exe -c 'import difflib'
time                 34.67 ms   (33.17 ms .. 36.52 ms)
                     0.995 R²   (0.990 R² .. 1.000 R²)
mean                 35.36 ms   (34.81 ms .. 36.25 ms)
std dev              1.450 ms   (1.045 ms .. 2.133 ms)
variance introduced by outliers: 12% (moderately inflated)

With our patch (and calls to stat)

$ bench "./python.exe -c 'import difflib'"
benchmarking ./python.exe -c 'import difflib'
time                 30.24 ms   (29.02 ms .. 32.66 ms)
                     0.988 R²   (0.968 R² .. 1.000 R²)
mean                 31.86 ms   (31.13 ms .. 32.75 ms)
std dev              1.789 ms   (1.329 ms .. 2.437 ms)
variance introduced by outliers: 17% (moderately inflated)

(This work was done in CPython 3.6 and we are exploring back-porting to 2.7
so we can run the hg startup benchmarks in the performance test suite.)

This is effectively a drop-in replacement for the frozen module capability
and (so far) required only minimal changes to the runtime.  To us, it seems
like a very nice win without compromising on compatibility or complexity.
I am happy to discuss more of the technical details until we have a public
patch available.

I hope this provides some optimism around the possibility of improving the
start-up time of CPython.  What do you all think?

Kindly,

Carl

A fast startup patch (was: Python startup time)

Carl Shapiro

Gregory P. Smith

Nick Coghlan

Carl Shapiro

Nathaniel Smith

Nick Coghlan

Toshio Kuratomi

Eric Fahlgren

Toshio Kuratomi

Nathaniel Smith

Miro Hrončok

Terry Reedy

Brett Cannon

Nick Coghlan

Glenn Linderman

Carl Shapiro

Steve Dower

tags

participants (11)