On Fri, May 4, 2018 at 6:58 PM, Nathaniel Smith <njs@pobox.com> wrote:
What are the obstacles to including "preloaded" objects in regular .pyc files, so that everyone can take advantage of this without rebuilding the interpreter?
The system we have developed can create a shared object file for each compiled Python file. However, such a representation is not directly usable. First, certain shared constants, such as interned strings, must be kept globally unique across object code files. Second, some marshaled objects, such as the hashed collections, must be initialized with randomization state that is not available until after the hosting runtime has been initialized. We are able to work around the first issue by generating a heap image with the transitive closure of all modules that will be loaded which allows us to easily maintain uniqueness guarantees. We are able to work around the second issue with some unobservable changes to the affected data structures. Based on our numbers, it appears there should be some hesitancy--at this time--to changing the format of compiled Python file for the sake of load-time performance. In contrast, the data shows that a focused change to address file system inefficiencies has the potential to broadly and transparently deliver benefit to users without affecting existing code or workflows.