On 23 July 2017 at 09:35, Steve Dower email@example.com wrote:
Yes, I’m aware of that, which is why I don’t have any specific suggestions off-hand. But given the differences in file systems between Windows and other OSs, it wouldn’t surprise me if there were a more optimal approach for NTFS to amortize calls better. Perhaps not, but it is still the most expensive part of startup that we have any ability to change, so it’s worth investigating.
That does remind me of a capability we haven''t played with a lot recently:
$ python3 -m site sys.path = [ '/home/ncoghlan', '/usr/lib64/python36.zip', '/usr/lib64/python3.6', '/usr/lib64/python3.6/lib-dynload', '/home/ncoghlan/.local/lib/python3.6/site-packages', '/usr/lib64/python3.6/site-packages', '/usr/lib/python3.6/site-packages', ] USER_BASE: '/home/ncoghlan/.local' (exists) USER_SITE: '/home/ncoghlan/.local/lib/python3.6/site-packages' (exists) ENABLE_USER_SITE: True
The interpreter puts a zip file ahead of the regular unpacked standard library on sys.path because at one point in time that was a useful optimisation technique for reducing import costs on application startup. It was a potentially big win with the old "multiple stat calls" import implementation, but I'm not aware of any more recent benchmarks relative to the current listdir-caching based import implementation.
So I think some interesting experiments to try measuring might be:
- pushing the "always imported" modules into a dedicated zip archive - having the interpreter pre-seed sys.modules with the contents of that dedicated archive - freezing those modules and building them into the interpreter that way - compiling the standalone top-level modules with Cython, and loading them as extension modules - compiling in the Cython generated modules as builtins (not currently an option for packages & submodules due to )
The nice thing about those kinds of approaches is that they're all fairly general purpose, and relate primarily to how the Python interpreter is put together, rather than how the individual modules are written in the first place.
(I'm not volunteering to run those experiments, though - just pointing out some of the technical options we have available to us that don't involve adding more handcrafted C extension modules to CPython)
P.S. Checking the current list of source modules implicitly loaded at startup, I get:
import sys sorted(k for k, m in sys.modules.items() if m.__spec__ is not None and type(m.__spec__.loader).__name__ == "SourceFileLoader")
['_collections_abc', '_sitebuiltins', '_weakrefset', 'abc', 'codecs', 'encodings', 'encodings.aliases', 'encodings.latin_1', 'encodings.utf_8', 'genericpath', 'io', 'os', 'os.path', 'posixpath', 'rlcompleter', 'site', 'stat']