What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter?

Off the top of my head:

We'd be making the in-memory layout of those objects part of the .pyc format, so we couldn't change that within a minor release. I suspect this wouldn't be a big change though, since we already commit to ABI compatibility for C extensions within a minor release? In principle there are some cases where this would be different (e.g. adding new fields at the end of an object is generally ABI compatible), but this might not be an issue for the types of objects we're talking about.

There's some memory management concern, since these are, y'know, heap objects, and we wouldn't be heap allocating them. The main constraint would be that you couldn't free them one at a time, but would have to free the whole block at once. But I think it at least wouldn't be too hard to track whether any of the objects in the block are still alive, and free the whole block if there aren't any. E.g., we could have an object flag that means "when this object is freed, don't call free(), instead find the containing block and decrement its live-object count. You probably need this flag even in the current version, right? (And the flag could also be an escape hatch if we did need to change object size: check for the flag before accessing the new fields.) Or maybe you could get clever tracking object liveness on an page by page basis; not sure it's worth it though. Unloading module-level objects is pretty rare.

I'm assuming these objects can have pointers to each other, and to well known constants like None, so you need some kind of relocation engine to fix those up. Right now I guess you're probably using the one built into the dynamic loader? In theory it shouldn't be too hard to write our own – basically just a list of offsets in the block where we need to add the base address or write the address of a well known constant, I think?

Anything else I'm missing?

On Fri, May 4, 2018, 16:06 Carl Shapiro <carl.shapiro@gmail.com> wrote:
On Fri, May 4, 2018 at 5:14 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
This definitely seems interesting, but is it something you'd be seeing us being able to take advantage of for conventional Python installations, or is it more something you'd expect to be useful for purpose-built interpreter instances? (e.g. if Mercurial were running their own Python, they could precache the heap objects for their commonly imported modules in their custom interpreter binary, regardless of whether those were standard library modules or not).

Yes, this would be a win for a conventional Python installation as well.  Specifically, users and their scripts would enjoy a reduction in cold-startup time.

In the numbers I showed yesterday, the version of the interpreter with our patch applied included unmarshaled data for the modules that always appear on the sys.modules list after an ordinary interpreter cold-start.  I believe it is worthwhile to including that set of modules in the standard CPython interpreter build.  Expanding that set to include the commonly imported modules might be an additional win, especially for short-running scripts.
 
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%40pobox.com

On May 4, 2018 16:06, "Carl Shapiro" <carl.shapiro@gmail.com> wrote:
On Fri, May 4, 2018 at 5:14 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
This definitely seems interesting, but is it something you'd be seeing us being able to take advantage of for conventional Python installations, or is it more something you'd expect to be useful for purpose-built interpreter instances? (e.g. if Mercurial were running their own Python, they could precache the heap objects for their commonly imported modules in their custom interpreter binary, regardless of whether those were standard library modules or not).

Yes, this would be a win for a conventional Python installation as well.  Specifically, users and their scripts would enjoy a reduction in cold-startup time.

In the numbers I showed yesterday, the version of the interpreter with our patch applied included unmarshaled data for the modules that always appear on the sys.modules list after an ordinary interpreter cold-start.  I believe it is worthwhile to including that set of modules in the standard CPython interpreter build.  Expanding that set to include the commonly imported modules might be an additional win, especially for short-running scripts.