(belated follow-up as I noticed there hadn't been a reply on list yet, just the previous feedback on the faster-cpython ticket)

On Mon, 21 Feb 2022, 6:53 pm Yichen Yan via Python-Dev, <python-dev@python.org> wrote:

Hi folks, as illustrated in faster-cpython#150 [1], we have implemented a mechanism that supports data persistence of a subset of python date types with mmap, therefore can reduce package import time by caching code object. This could be seen as a more eager pyc format, as they are for the same purpose, but our approach try to avoid [de]serialization. Therefore, we get a speedup in overall python startup by ~15%.


This certainly sounds interesting! 


Currently, we’ve made it a third-party library and have been working on open-sourcing.


Our implementation (whose non-official name is “pycds”) mainly contains two parts:

  • importlib hooks, this implements the mechanism to dump code objects to an archive and a `Finder` that supports loading code object from mapped memory.
  • Dumping and loading (subset of) python types with mmap. In this part, we deal with 1) ASLR by patching `ob_type` fields; 2) hash seed randomization by supporting only basic types who don’t have hash-based layout (i.e. dict is not supported); 3) interned string by re-interning strings while loading mmap archive and so on.

I assume the files wouldn't be portable across architectures, so does the cache file naming scheme take that into account?

(The idea is interesting regardless of whether it produces arch-specific files - kind of a middle ground between portable serialisation based pycs and fully frozen modules)

Cheers,
Nick.