Re: [Python-Dev] Python startup time

2 May 2018

      On Tue, May 1, 2018 at 11:55 PM, Ray Donnelly 
wrote:
...
Is your Python interpreter statically linked? The Python 3 ones from the
anaconda distribution (use Miniconda!) are for Linux and macOS and that
roughly halved our startup times.
My Python interpreters use a shared library. I'll definitely investigate
the performance of a statically-linked interpreter.

Correct me if I'm wrong, but aren't there downsides with regards to C
extension compatibility to not having a shared libpython? Or does all the
packaging tooling "just work" without a libpython? (It's possible I have my
wires crossed up with something else regarding a statically linked Python.)

On Wed, May 2, 2018 at 2:26 AM, Victor Stinner  wrote:
...
What do you propose to make Python startup faster?
That's a very good question. I'm not sure I'm able to answer it because I
haven't dug too much into CPython's internals much farther than what is
required to implement C extensions. But I can share insight from what the
Mercurial project has collectively learned.
...
As I wrote in my previous emails, many Python core developers care of
the startup time and we are working on making it faster.
INADA Naoki added -X importtime to identify slow imports and
understand where Python spent its startup time.
-X importtime is a great start! For a follow-up enhancement, it would be
useful to see what aspects of import are slow. Is it finding modules
(involves filesystem I/O)? Is it unmarshaling pyc files? Is it executing
the module code? If executing code, what part is slow? Inline
statements/expressions? Compiling types? Printing the microseconds it takes
to import a module is useful. But it only gives me a general direction: I
want to know what parts of the import made it slow so I know if I should be
focusing on code running during module import, slimming down the size of a
module, eliminating the module import from fast paths, pursuing alternative
module importers, etc.
...
Recent example: Barry Warsaw identified that pkg_resources is slow and
added importlib.resources to Python 3.7:
https://docs.python.org/dev/library/importlib.html#module-
importlib.resources
Brett Cannon is also working on a standard solution for lazy imports
since many years:
https://pypi.org/project/modutil/
https://snarky.ca/lazy-importing-in-python-3-7/
Mercurial has used lazy module imports for years. On 2.7.14, it reduces `hg
version` from ~160ms to ~55ms (~34% of original). On Python 3, we're using
`importlib.util.LazyLoader` and it reduces `hg version` on 3.7 from ~245ms
to ~120ms (~49% of original). I'm not sure why Python 3's built-in module
importer doesn't yield the speedup that our custom Python 2 importer does.
One explanation is our custom importer is more advanced than importlib.
Another is that Python 3's import mechanism is slower (possibly due to
being written in Python instead of C). We haven't yet spent much time
optimizing Mercurial for Python 3: our immediate goal is to get it working
first. Given the startup performance problem on Python 3, it is only a
matter of time before we dig into this further.

It's worth noting that lazy module importing can be undone via common
patterns. Most commonly, `from foo import X`. It's *really* difficult to
implement a proper object proxy. Mercurial's lazy importer gives up in this
case and imports the module and exports the symbol. (But if the imported
module is a package, we detect that and make the module exports proxies to
a lazy module.)

Another common undermining of the lazy importer is code that runs during
import time module exec that accesses an attribute. e.g.

```
import foo

class myobject(foo.Foo):
    pass
```

Mercurial goes out of its way to avoid these patterns so modules can be
delay imported as much as possible. As long as import times are
problematic, it would be helpful if the standard library adopted similar
patterns. Although I recognize there are backwards compatibility concerns
that tie your hands a bit.
...
Nick Coghlan is working on the C API to configure Python startup: PEP
432. When it will be ready, maybe Mercurial could use a custom Python
optimized for its use case.
That looks great!

The direction Mercurial is going in is that `hg` will likely become a Rust
binary (instead of a #!python script) that will use an embedded Python
interpreter. So we will have low-level control over the interpreter via the
C API. I'd also like to see us distribute a copy of Python in our official
builds. This will allow us to take various shortcuts, such as not having to
probe various sys.path entries since certain packages can only exist in one
place. I'd love to get to the state Google is at where they have
self-contained binaries with ELF sections containing Python modules. But
that requires a bit of very low-level hacking. We'll likely have a Rust
binary (that possibly static links libpython) and a separate JAR/zip-like
file containing resources.

But many people obtain Python via their system package manager and no
matter how hard we scream that Mercurial is a standalone application, they
will configure their packages to link against the system libpython and use
the system Python's standard library. This will potentially undo many of
our startup time wins.
...
IMHO Python import system is inefficient. We try too many alternative
names.
Example with Python 3.8
$ ./python -vv:
...
...
...
import dontexist
# trying /home/vstinner/prog/python/master/dontexist.cpython-38dm-
x86_64-linux-gnu.so
# trying /home/vstinner/prog/python/master/dontexist.abi3.so
# trying /home/vstinner/prog/python/master/dontexist.so
# trying /home/vstinner/prog/python/master/dontexist.py
# trying /home/vstinner/prog/python/master/dontexist.pyc
# trying /home/vstinner/prog/python/master/Lib/dontexist.cpython-
38dm-x86_64-linux-gnu.so
# trying /home/vstinner/prog/python/master/Lib/dontexist.abi3.so
# trying /home/vstinner/prog/python/master/Lib/dontexist.so
# trying /home/vstinner/prog/python/master/Lib/dontexist.py
# trying /home/vstinner/prog/python/master/Lib/dontexist.pyc
# trying /home/vstinner/prog/python/master/build/lib.linux-x86_64-
3.8-pydebug/dontexist.cpython-38dm-x86_64-linux-gnu.so
# trying /home/vstinner/prog/python/master/build/lib.linux-x86_64-
3.8-pydebug/dontexist.abi3.so
# trying /home/vstinner/prog/python/master/build/lib.linux-x86_64-
3.8-pydebug/dontexist.so
# trying /home/vstinner/prog/python/master/build/lib.linux-x86_64-
3.8-pydebug/dontexist.py
# trying /home/vstinner/prog/python/master/build/lib.linux-x86_64-
3.8-pydebug/dontexist.pyc
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontex
ist.cpython-38dm-x86_64-linux-gnu.so
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontex
ist.abi3.so
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.so
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.py
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.pyc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in
_find_and_load_unlocked
ModuleNotFoundError: No module named 'dontexist'
Why do we still check for the .pyc file outside __pycache__ directories?
Why do we have to check for 3 different names for .so files?
Yes, I also cringe every time I trace Python's system calls and see these
needless stats and file opens. Unless Python adds the ability to tell the
import mechanism what type of module to import, Mercurial will likely
modify our custom importer to only look for specific files. We do provide
pure Python modules for modules that have C implementations. But we have
code that ensures that the C version is loaded for certain Python
configurations because we don't want users accidentally using the non-C
modules and then complaining about Mercurial's performance! We already
denote the set of modules backed by C. What we're missing (but is certainly
possible to implement) is code that limits the module finding search
depending on whether the module is backed by Python or C. But this only
really works for Mercurial's modules: we don't really know what the
standard library is doing and coding assumptions into Mercurial about
standard library behavior feels dangerous.

If we ship our own Python distribution, we'll likely have a jar-like file
containing all modules. Determining which file to load will read an
in-memory file index and not require any expensive system calls to look for
files.
...
Does Mercurial need all directories of sys.path?
No and yes. Mercurial by itself can get by with just the standard library
and Mercurial's own packages. But extensions change everything. An
extension could modify sys.path though. So limiting sys.path inside
Mercurial is somewhat reasonable. Although it's definitely unexpected for a
Python application to be removing entries from sys.path when the
application starts.
...
What's the status of the "system python" project? :-)
I also would prefer Python without the site module. Can we rewrite
this module in C maybe? Until recently, the site module was needed on
Python to create the "mbcs" encoding alias. Hopefully, the feature has
been removed into Lib/encodings/__init__.py (new private _alias_mbcs()
function).
I also lament the startup time effects of site.py. When `hg` is a Rust
binary, we will almost certainly skip site.py and manually perform any
required actions that it was performing.
...
Python 3.7b3+:
$ python3.7 -X importtime -c pass
import time: self [us] | cumulative | imported package
import time:        95 |         95 | zipimport
import time:       589 |        589 | _frozen_importlib_external
import time:        67 |         67 |     _codecs
import time:       498 |        565 |   codecs
import time:       425 |        425 |   encodings.aliases
import time:       641 |       1629 | encodings
import time:       228 |        228 | encodings.utf_8
import time:       143 |        143 | _signal
import time:       335 |        335 | encodings.latin_1
import time:        58 |         58 |     _abc
import time:       265 |        322 |   abc
import time:       298 |        619 | io
import time:        69 |         69 |       _stat
import time:       196 |        265 |     stat
import time:       169 |        169 |       genericpath
import time:       336 |        505 |     posixpath
import time:      1190 |       1190 |     _collections_abc
import time:       600 |       2557 |   os
import time:       223 |        223 |   _sitebuiltins
import time:       214 |        214 |   sitecustomize
import time:        74 |         74 |   usercustomize
import time:       477 |       3544 | site
As for things Python could do to make things better, one idea is for
"package bundles." Instead of using .py, .pyc, .so, etc files as separate
files on the filesystem, allow Python packages to be distributed as
standalone "archive" files. Like Java's jar files. This has the advantage
that there is only a single place to look for files in a given Python
package. And since the bundle is immutable, you can index it so imports
don't need to touch the filesystem to discover what is present: you do a
quick memory lookup and jump straight to the available file. If you go this
route, please don't require the use of zlib for file compression, as zlib
is painfully slow compared to alternatives like lz4 and zstandard.

I know this kinda/sorta exists with zipimporter. But zipimporter uses zlib
(slow) and only allows .py/.pyc files. And I think some Python application
distribution tools have also solved this problem. I'd *really* like to see
a proper/robust solution in Python itself. Along that vein, it would be
really nice if the "standalone Python application" story were a bit more
formalized. From my perspective, it is insanely difficult to package and
distribute an application that happens to use Python. It requires vastly
different solutions for different platforms. I want to declare a minimal
boilerplate somewhere (perhaps in setup.py) and run a command that produces
an as-self-contained-as-possible application complete with platform-native
installers. Presumably such a self-contained application could take many
shortcuts with regards to process startup and mitigate this general
problem. Again, Mercurial is trending in the direction of making `hg` a
Rust binary and distributing its own Python. Since we have to solve this
packaging+distribution problem on multiple platforms, I'll try to keep an
eye towards making whatever solution we concoct reusable by other projects.

Re: [Python-Dev] Python startup time

Gregory Szorc