I assume the files wouldn't be portable across architectures
so does the cache file naming scheme take that into account?
(The idea is interesting regardless of whether it produces arch-specific files - kind of a middle ground between portable serialisation based pycs and fully frozen modules)
On Mar 20, 2022, at 23:26, Nick Coghlan <ncoghlan@gmail.com> wrote:
(belated follow-up as I noticed there hadn't been a reply on list yet, just the previous feedback on the faster-cpython ticket)On Mon, 21 Feb 2022, 6:53 pm Yichen Yan via Python-Dev, <python-dev@python.org> wrote:Hi folks, as illustrated in faster-cpython#150 [1], we have implemented a mechanism that supports data persistence of a subset of python date types with mmap, therefore can reduce package import time by caching code object. This could be seen as a more eager pyc format, as they are for the same purpose, but our approach try to avoid [de]serialization. Therefore, we get a speedup in overall python startup by ~15%.This certainly sounds interesting!Currently, we’ve made it a third-party library and have been working on open-sourcing.Our implementation (whose non-official name is “pycds”) mainly contains two parts:
importlib hooks, this implements the mechanism to dump code objects to an archive and a `Finder` that supports loading code object from mapped memory. Dumping and loading (subset of) python types with mmap. In this part, we deal with 1) ASLR by patching `ob_type` fields; 2) hash seed randomization by supporting only basic types who don’t have hash-based layout (i.e. dict is not supported); 3) interned string by re-interning strings while loading mmap archive and so on.I assume the files wouldn't be portable across architectures, so does the cache file naming scheme take that into account?(The idea is interesting regardless of whether it produces arch-specific files - kind of a middle ground between portable serialisation based pycs and fully frozen modules)Cheers,Nick.