[Python-Dev] Store startup modules as C structures for 20%+ startup speed improvement?

Wed Sep 19 20:34:01 EDT 2018

On Sat, Sep 15, 2018 at 2:53 AM Paul Moore <p.f.moore at gmail.com> wrote:

> On Fri, 14 Sep 2018 at 23:28, Neil Schemenauer <nas-python at arctrix.com>
> wrote:
> >
> > On 2018-09-14, Larry Hastings wrote:
> > > [..] adding the stat calls back in costs you half the startup.  So
> > > any mechanism where we're talking to the disk _at all_ simply
> > > isn't going to be as fast.
> >
> > Okay, so if we use hundreds of small .pyc files scattered all over
> > the disk, that's bad?  Who would have thunk it. ;-P
> >
> > We could have a new format, .pya (compiled python archive) that has
> > data for many .pyc files in it.  In normal runs you would have one
> > or just and handlful of these things (e.g. one for stdlib, one for
> > your app and all the packages it uses).  Then you mmap these just
> > once and rely on OS page faults to bring in the data as you need it.
> > The .pya would have a hash table at the start or end that tells you
> > the offset for each module.
>
> Isn't that essentially what putting the stdlib in a zipfile does? (See
> the windows embedded distribution for an example). It probably uses
> normal IO rather than mmap, but maybe adding a "use mmap" flag to the
> zipfile module would be a more general enhancement that zipimport
> could use for free.
>
> Paul
>

To share a lesson learned: Putting the stdlib in a zip file is doable, but
comes with a caveats that would likely make OS distros want to undo the
change if done with CPython today:

We did that for one of our internal pre-built Python 2.7 distributions used
internally at Google used in the 2012-2014 timeframe.  Thinking at the time
"yay, less inodes and disk space and stat calls by the interpreter on all
machines."

The caveat we didn't anticipate was unfortunately that zipimport.c cannot
handle the zip file changing out from underneath a running process.  Ever.
It does not hold an open file handle to the zip file (which on posix
systems would ameliorate the problem) but instead regularly reopens it by
name while using a startup-time cached zip file index.  So when you deploy
a change to your Python interpreter (as any OS distro package update,
security update, upgrade, etc.) existing running processes that go on to do
another import of a stdlib module that hadn't already been imported
(statistically likely to be a codec related module, as those are often
imported upon first use rather than at startup time with most modules the
way people tend to structure their code) read a different zipfile using a
cached index from a previous one and... boom.  A strange rolling error in
production that is not pretty to debug.  Fixing zipimport.c to deal with
this properly was tried, but still ran into issues, and was deemed
ultimately infeasible.  There's a BPO issue or three filed about this if
you go hunting.

On the contrary, having compiled in constants in the executable is fine and
will never suffer from this problem.  Those are mapped as RO data by the
dynamic loader and demand paged.  No complicated code in CPython required
to manage them aside from the stdlib startup code import intercepting logic
(which should be reasonably small, even without having looked at the patch
in the PR yet).

There's ongoing work to rewrite zipimport.c in python using zipfile itself
which if used for the stdlib will require everything that it needs to be
frozen into C data similar to existing bootstrap import logic - and being a
different implementation of zip file reading code might be possible to do
without suffering the same caveat.  But storing the data on the C side
still sounds like a much simpler code path to me.

The maintenance concern is mostly about testing and building to make sure
we include everything needed by the interpreter and keep it up to date.
I'd like a configure flag controlling when the feature is to be "on by
default". Having it off by default and enabled by an interpreter command
line flag otherwise. Consider adding the individual configure flag to the
set of things that --with-optimizations turns on for people.

Don't be surprised if Facebook reports a startup time speedup greater than
what you ever measure yourself. Their applications are different, and if
they're using their XAR thing that mounts applications as a FUSE filesystem
- that increases stat() overhead beyond what it already is with additional
kernel round trips so it'll benefit that design even more.

Any savings in startup time by not doing a crazy amount of sequential high
latency blocking system calls is a good thing regardless.  Not just for
command line tools.  Serving applications that are starting up are
effectively spinning consuming CPUs to ultimately compute the same result
everywhere for every application every time before performing useful
work...  You can measure such an optimization in a worthwhile amount of $
or carbon footprint saved around the world.  Heat death of the universe by
a billion cuts.  Thanks for working on this!

-G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180919/66e63710/attachment-0001.html>