[Python-Dev] A fast startup patch (was: Python startup time)

Nathaniel Smith njs at pobox.com
Fri May 4 21:58:27 EDT 2018


What are the obstacles to including "preloaded" objects in regular .pyc
files, so that everyone can take advantage of this without rebuilding the
interpreter?

Off the top of my head:

We'd be making the in-memory layout of those objects part of the .pyc
format, so we couldn't change that within a minor release. I suspect this
wouldn't be a big change though, since we already commit to ABI
compatibility for C extensions within a minor release? In principle there
are some cases where this would be different (e.g. adding new fields at the
end of an object is generally ABI compatible), but this might not be an
issue for the types of objects we're talking about.

There's some memory management concern, since these are, y'know, heap
objects, and we wouldn't be heap allocating them. The main constraint would
be that you couldn't free them one at a time, but would have to free the
whole block at once. But I think it at least wouldn't be too hard to track
whether any of the objects in the block are still alive, and free the whole
block if there aren't any. E.g., we could have an object flag that means
"when this object is freed, don't call free(), instead find the containing
block and decrement its live-object count. You probably need this flag even
in the current version, right? (And the flag could also be an escape hatch
if we did need to change object size: check for the flag before accessing
the new fields.) Or maybe you could get clever tracking object liveness on
an page by page basis; not sure it's worth it though. Unloading
module-level objects is pretty rare.

I'm assuming these objects can have pointers to each other, and to well
known constants like None, so you need some kind of relocation engine to
fix those up. Right now I guess you're probably using the one built into
the dynamic loader? In theory it shouldn't be too hard to write our own –
basically just a list of offsets in the block where we need to add the base
address or write the address of a well known constant, I think?

Anything else I'm missing?

On Fri, May 4, 2018, 16:06 Carl Shapiro <carl.shapiro at gmail.com> wrote:

> On Fri, May 4, 2018 at 5:14 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
>> This definitely seems interesting, but is it something you'd be seeing us
>> being able to take advantage of for conventional Python installations, or
>> is it more something you'd expect to be useful for purpose-built
>> interpreter instances? (e.g. if Mercurial were running their own Python,
>> they could precache the heap objects for their commonly imported modules in
>> their custom interpreter binary, regardless of whether those were standard
>> library modules or not).
>>
>
> Yes, this would be a win for a conventional Python installation as well.
> Specifically, users and their scripts would enjoy a reduction in
> cold-startup time.
>
> In the numbers I showed yesterday, the version of the interpreter with our
> patch applied included unmarshaled data for the modules that always appear
> on the sys.modules list after an ordinary interpreter cold-start.  I
> believe it is worthwhile to including that set of modules in the standard
> CPython interpreter build.  Expanding that set to include the commonly
> imported modules might be an additional win, especially for short-running
> scripts.
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
>

On May 4, 2018 16:06, "Carl Shapiro" <carl.shapiro at gmail.com> wrote:

On Fri, May 4, 2018 at 5:14 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> This definitely seems interesting, but is it something you'd be seeing us
> being able to take advantage of for conventional Python installations, or
> is it more something you'd expect to be useful for purpose-built
> interpreter instances? (e.g. if Mercurial were running their own Python,
> they could precache the heap objects for their commonly imported modules in
> their custom interpreter binary, regardless of whether those were standard
> library modules or not).
>

Yes, this would be a win for a conventional Python installation as well.
Specifically, users and their scripts would enjoy a reduction in
cold-startup time.

In the numbers I showed yesterday, the version of the interpreter with our
patch applied included unmarshaled data for the modules that always appear
on the sys.modules list after an ordinary interpreter cold-start.  I
believe it is worthwhile to including that set of modules in the standard
CPython interpreter build.  Expanding that set to include the commonly
imported modules might be an additional win, especially for short-running
scripts.

_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180505/227eacce/attachment-0001.html>


More information about the Python-Dev mailing list