[Python-Dev] Re: A memory map based data persistence and startup speedup approach

20 Mar 2022


      (belated follow-up as I noticed there hadn't been a reply on list yet, just
the previous feedback on the faster-cpython ticket)

On Mon, 21 Feb 2022, 6:53 pm Yichen Yan via Python-Dev, <
python-dev@python.org> wrote:
...
Hi folks, as illustrated in faster-cpython#150 [1], we have implemented a
mechanism that supports data persistence of a subset of python date types
with mmap, therefore can reduce package import time by caching code object.
This could be seen as a more eager pyc format, as they are for the same
purpose, but our approach try to avoid [de]serialization. Therefore, we get
a speedup in overall python startup by ~15%.
This certainly sounds interesting!
...
Currently, we’ve made it a third-party library and have been working on
open-sourcing.
Our implementation (whose non-official name is “pycds”) mainly contains
two parts:
- importlib hooks, this implements the mechanism to dump code objects
   to an archive and a `Finder` that supports loading code object from mapped
   memory.
   - Dumping and loading (subset of) python types with mmap. In this
   part, we deal with 1) ASLR by patching `ob_type` fields; 2) hash seed
   randomization by supporting only basic types who don’t have hash-based
   layout (i.e. dict is not supported); 3) interned string by re-interning
   strings while loading mmap archive and so on.
I assume the files wouldn't be portable across architectures, so does the
cache file naming scheme take that into account?
(The idea is interesting regardless of whether it produces arch-specific
files - kind of a middle ground between portable serialisation based pycs
and fully frozen modules)

Cheers,
Nick.
...

[Python-Dev] Re: A memory map based data persistence and startup speedup approach

Nick Coghlan