[Python-Dev] Store startup modules as C structures for 20%+ startup speed improvement?
Larry Hastings
larry at hastings.org
Fri Sep 14 17:27:37 EDT 2018
What follows is the text of issue 34690:
https://bugs.python.org/issue34690
The PR is here:
https://github.com/python/cpython/pull/9320
I don't know if we should be discussing this here on python-dev, or on
bpo, or on Zulip, or on the soon-to-be-created Discourse. But maybe we
can talk about it somewhere!
//arry/
----
This patch was sent to me privately by Jeethu Rao at Facebook. It's a
change they're working with internally to improve startup time. What
I've been told by Carl Shapiro at Facebook is that we have their
blessing to post it publicly / merge it / build upon it for CPython.
Their patch was written for 3.6, I have massaged it to the point where
it minimally works with 3.8.
What the patch does: it takes all the Python modules that are loaded as
part of interpreter startup and deserializes the marshalled .pyc file
into precreated objects stored as static C data. You add this .C file
to the Python build. Then there's a patch to Python itself (about 250
lines iirc) that teaches it to load modules from these data structures.
I wrote a quick dumb test harness to compare this patch vs 3.8 stock.
It runs a command line 500 times and uses time.perf_counter to time the
process. On a fast quiescent laptop I observe a 21-22% improvement:
cmdline: ['./python', '-c', 'pass']
500 runs:
sm38
average time 0.006302303705982922
best 0.006055746000129147
worst 0.00816565500008437
clean38
average time 0.007969956444008858
best 0.007829047999621253
worst 0.008812210000542109
improvement 0.20924239043734505 %
cmdline: ['./python', '-c', 'import io']
500 runs:
sm38
average time 0.006297688038004708
best 0.005980765999993309
worst 0.0072462130010535475
clean38
average time 0.007996319670004595
best 0.0078091849991324125
worst 0.009175700999549008
improvement 0.21242667903482038 %
The downside of the patch: for these modules it ignores the Python files
on disk--it doesn't even stat them. If you add stat calls you lose half
of the speed improvement. I believe they added a work-around, where you
can set a flag (command-line? environment variable? I don't know, I
didn't go looking for it) that tells Python "don't use the frozen
modules" and it loads all those files from disk.
I don't propose to merge the patch in its current state. I think it
would need a lot of work both in terms of "doing things the way Python
does it" as well as just code smell (the serializer is implemented in
both C and Python and jumps back and forth, also the build process for
the serialized modules is pretty tiresome).
Is it worth working on?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180914/670812b3/attachment.html>
More information about the Python-Dev
mailing list