[Python-Dev] Store startup modules as C structures for 20%+ startup speed improvement?

Paul Moore p.f.moore at gmail.com
Sat Sep 15 05:53:20 EDT 2018


On Fri, 14 Sep 2018 at 23:28, Neil Schemenauer <nas-python at arctrix.com> wrote:
>
> On 2018-09-14, Larry Hastings wrote:
> > [..] adding the stat calls back in costs you half the startup.  So
> > any mechanism where we're talking to the disk _at all_ simply
> > isn't going to be as fast.
>
> Okay, so if we use hundreds of small .pyc files scattered all over
> the disk, that's bad?  Who would have thunk it. ;-P
>
> We could have a new format, .pya (compiled python archive) that has
> data for many .pyc files in it.  In normal runs you would have one
> or just and handlful of these things (e.g. one for stdlib, one for
> your app and all the packages it uses).  Then you mmap these just
> once and rely on OS page faults to bring in the data as you need it.
> The .pya would have a hash table at the start or end that tells you
> the offset for each module.

Isn't that essentially what putting the stdlib in a zipfile does? (See
the windows embedded distribution for an example). It probably uses
normal IO rather than mmap, but maybe adding a "use mmap" flag to the
zipfile module would be a more general enhancement that zipimport
could use for free.

Paul


More information about the Python-Dev mailing list