On Tue, May 1, 2018 at 11:55 PM, Ray Donnelly <mingw.android@gmail.com> wrote: 

> Is your Python interpreter statically linked? The Python 3 ones from the anaconda distribution (use Miniconda!) are for Linux and macOS and that roughly halved our startup times.

My Python interpreters use a shared library. I'll definitely investigate the performance of a statically-linked interpreter.

Correct me if I'm wrong, but aren't there downsides with regards to C extension compatibility to not having a shared libpython? Or does all the packaging tooling "just work" without a libpython? (It's possible I have my wires crossed up with something else regarding a statically linked Python.)

On Wed, May 2, 2018 at 2:26 AM, Victor Stinner <vstinner@redhat.com> wrote:
What do you propose to make Python startup faster?

That's a very good question. I'm not sure I'm able to answer it because I haven't dug too much into CPython's internals much farther than what is required to implement C extensions. But I can share insight from what the Mercurial project has collectively learned.

As I wrote in my previous emails, many Python core developers care of
the startup time and we are working on making it faster.

INADA Naoki added -X importtime to identify slow imports and
understand where Python spent its startup time.

-X importtime is a great start! For a follow-up enhancement, it would be useful to see what aspects of import are slow. Is it finding modules (involves filesystem I/O)? Is it unmarshaling pyc files? Is it executing the module code? If executing code, what part is slow? Inline statements/expressions? Compiling types? Printing the microseconds it takes to import a module is useful. But it only gives me a general direction: I want to know what parts of the import made it slow so I know if I should be focusing on code running during module import, slimming down the size of a module, eliminating the module import from fast paths, pursuing alternative module importers, etc.

Recent example: Barry Warsaw identified that pkg_resources is slow and
added importlib.resources to Python 3.7:

Brett Cannon is also working on a standard solution for lazy imports
since many years:

Mercurial has used lazy module imports for years. On 2.7.14, it reduces `hg version` from ~160ms to ~55ms (~34% of original). On Python 3, we're using `importlib.util.LazyLoader` and it reduces `hg version` on 3.7 from ~245ms to ~120ms (~49% of original). I'm not sure why Python 3's built-in module importer doesn't yield the speedup that our custom Python 2 importer does. One explanation is our custom importer is more advanced than importlib. Another is that Python 3's import mechanism is slower (possibly due to being written in Python instead of C). We haven't yet spent much time optimizing Mercurial for Python 3: our immediate goal is to get it working first. Given the startup performance problem on Python 3, it is only a matter of time before we dig into this further.

It's worth noting that lazy module importing can be undone via common patterns. Most commonly, `from foo import X`. It's *really* difficult to implement a proper object proxy. Mercurial's lazy importer gives up in this case and imports the module and exports the symbol. (But if the imported module is a package, we detect that and make the module exports proxies to a lazy module.)

Another common undermining of the lazy importer is code that runs during import time module exec that accesses an attribute. e.g.

import foo

class myobject(foo.Foo):

Mercurial goes out of its way to avoid these patterns so modules can be delay imported as much as possible. As long as import times are problematic, it would be helpful if the standard library adopted similar patterns. Although I recognize there are backwards compatibility concerns that tie your hands a bit.
Nick Coghlan is working on the C API to configure Python startup: PEP
432. When it will be ready, maybe Mercurial could use a custom Python
optimized for its use case.

That looks great!

The direction Mercurial is going in is that `hg` will likely become a Rust binary (instead of a #!python script) that will use an embedded Python interpreter. So we will have low-level control over the interpreter via the C API. I'd also like to see us distribute a copy of Python in our official builds. This will allow us to take various shortcuts, such as not having to probe various sys.path entries since certain packages can only exist in one place. I'd love to get to the state Google is at where they have self-contained binaries with ELF sections containing Python modules. But that requires a bit of very low-level hacking. We'll likely have a Rust binary (that possibly static links libpython) and a separate JAR/zip-like file containing resources.

But many people obtain Python via their system package manager and no matter how hard we scream that Mercurial is a standalone application, they will configure their packages to link against the system libpython and use the system Python's standard library. This will potentially undo many of our startup time wins.

IMHO Python import system is inefficient. We try too many alternative names.

Example with Python 3.8

$ ./python -vv:
>>> import dontexist
# trying /home/vstinner/prog/python/master/dontexist.cpython-38dm-x86_64-linux-gnu.so
# trying /home/vstinner/prog/python/master/dontexist.abi3.so
# trying /home/vstinner/prog/python/master/dontexist.so
# trying /home/vstinner/prog/python/master/dontexist.py
# trying /home/vstinner/prog/python/master/dontexist.pyc
# trying /home/vstinner/prog/python/master/Lib/dontexist.cpython-38dm-x86_64-linux-gnu.so
# trying /home/vstinner/prog/python/master/Lib/dontexist.abi3.so
# trying /home/vstinner/prog/python/master/Lib/dontexist.so
# trying /home/vstinner/prog/python/master/Lib/dontexist.py
# trying /home/vstinner/prog/python/master/Lib/dontexist.pyc
# trying /home/vstinner/prog/python/master/build/lib.linux-x86_64-3.8-pydebug/dontexist.cpython-38dm-x86_64-linux-gnu.so
# trying /home/vstinner/prog/python/master/build/lib.linux-x86_64-3.8-pydebug/dontexist.abi3.so
# trying /home/vstinner/prog/python/master/build/lib.linux-x86_64-3.8-pydebug/dontexist.so
# trying /home/vstinner/prog/python/master/build/lib.linux-x86_64-3.8-pydebug/dontexist.py
# trying /home/vstinner/prog/python/master/build/lib.linux-x86_64-3.8-pydebug/dontexist.pyc
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.cpython-38dm-x86_64-linux-gnu.so
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.abi3.so
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.so
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.py
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.pyc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'dontexist'

Why do we still check for the .pyc file outside __pycache__ directories?

Why do we have to check for 3 different names for .so files?

Yes, I also cringe every time I trace Python's system calls and see these needless stats and file opens. Unless Python adds the ability to tell the import mechanism what type of module to import, Mercurial will likely modify our custom importer to only look for specific files. We do provide pure Python modules for modules that have C implementations. But we have code that ensures that the C version is loaded for certain Python configurations because we don't want users accidentally using the non-C modules and then complaining about Mercurial's performance! We already denote the set of modules backed by C. What we're missing (but is certainly possible to implement) is code that limits the module finding search depending on whether the module is backed by Python or C. But this only really works for Mercurial's modules: we don't really know what the standard library is doing and coding assumptions into Mercurial about standard library behavior feels dangerous.

If we ship our own Python distribution, we'll likely have a jar-like file containing all modules. Determining which file to load will read an in-memory file index and not require any expensive system calls to look for files.

Does Mercurial need all directories of sys.path?

No and yes. Mercurial by itself can get by with just the standard library and Mercurial's own packages. But extensions change everything. An extension could modify sys.path though. So limiting sys.path inside Mercurial is somewhat reasonable. Although it's definitely unexpected for a Python application to be removing entries from sys.path when the application starts.

What's the status of the "system python" project? :-)

I also would prefer Python without the site module. Can we rewrite
this module in C maybe? Until recently, the site module was needed on
Python to create the "mbcs" encoding alias. Hopefully, the feature has
been removed into Lib/encodings/__init__.py (new private _alias_mbcs()

I also lament the startup time effects of site.py. When `hg` is a Rust binary, we will almost certainly skip site.py and manually perform any required actions that it was performing.

Python 3.7b3+:

$ python3.7 -X importtime -c pass
import time: self [us] | cumulative | imported package
import time:        95 |         95 | zipimport
import time:       589 |        589 | _frozen_importlib_external
import time:        67 |         67 |     _codecs
import time:       498 |        565 |   codecs
import time:       425 |        425 |   encodings.aliases
import time:       641 |       1629 | encodings
import time:       228 |        228 | encodings.utf_8
import time:       143 |        143 | _signal
import time:       335 |        335 | encodings.latin_1
import time:        58 |         58 |     _abc
import time:       265 |        322 |   abc
import time:       298 |        619 | io
import time:        69 |         69 |       _stat
import time:       196 |        265 |     stat
import time:       169 |        169 |       genericpath
import time:       336 |        505 |     posixpath
import time:      1190 |       1190 |     _collections_abc
import time:       600 |       2557 |   os
import time:       223 |        223 |   _sitebuiltins
import time:       214 |        214 |   sitecustomize
import time:        74 |         74 |   usercustomize
import time:       477 |       3544 | site

As for things Python could do to make things better, one idea is for "package bundles." Instead of using .py, .pyc, .so, etc files as separate files on the filesystem, allow Python packages to be distributed as standalone "archive" files. Like Java's jar files. This has the advantage that there is only a single place to look for files in a given Python package. And since the bundle is immutable, you can index it so imports don't need to touch the filesystem to discover what is present: you do a quick memory lookup and jump straight to the available file. If you go this route, please don't require the use of zlib for file compression, as zlib is painfully slow compared to alternatives like lz4 and zstandard.

I know this kinda/sorta exists with zipimporter. But zipimporter uses zlib (slow) and only allows .py/.pyc files. And I think some Python application distribution tools have also solved this problem. I'd *really* like to see a proper/robust solution in Python itself. Along that vein, it would be really nice if the "standalone Python application" story were a bit more formalized. From my perspective, it is insanely difficult to package and distribute an application that happens to use Python. It requires vastly different solutions for different platforms. I want to declare a minimal boilerplate somewhere (perhaps in setup.py) and run a command that produces an as-self-contained-as-possible application complete with platform-native installers. Presumably such a self-contained application could take many shortcuts with regards to process startup and mitigate this general problem. Again, Mercurial is trending in the direction of making `hg` a Rust binary and distributing its own Python. Since we have to solve this packaging+distribution problem on multiple platforms, I'll try to keep an eye towards making whatever solution we concoct reusable by other projects.