[Python-Dev] Python startup time

David Mertz mertz at gnosis.cx
Fri Jul 21 03:12:20 EDT 2017


How implausible is it to write out the actual memory image of a loaded
Python process? I.e. on a specific machine, OS, Python version, etc? This
can only be overhead initially, of course, but on subsequent runs it's just
one memory map, which the cheapest possible operation.

E.g.

$ python3.7 --write-image "import typing, re, os, numpy"

I imagine this creating a file like:

/tmp/__python__/python37-typing-re-os-numpy.mem

Then just terminating as if just that line had run, however long it takes
(but snapshotting before exit).

Then subsequent invocations would only restore the image to memory. Maybe:

$ pyrunner --load-image python37-typing-re-os-numpy myscript.py

The last line could be aliased of course. I suppose we'd need to check if
relevant file exists, and if not fall back to just ignoring the
'--load-image' flag and running plain old Python.

This helps not at all for something like AWS Lambda where each instance is
spun up fresh. But for the use-case of running many Python shell commands
at an interactive shell on one machine, it seems like that could be very
fast.

In my hypothetical I suppose pre-loading some collection of modules in the
image. Of course, the script may need to load others, and it may not use
some in the image. But users could decide their typical needed modules
themselves under this idea.

On Jul 20, 2017 11:27 PM, "Nick Coghlan" <ncoghlan at gmail.com> wrote:

> On 21 July 2017 at 15:30, Cesare Di Mauro <cesare.di.mauro at gmail.com>
> wrote:
>
>>
>>
>> 2017-07-21 4:52 GMT+02:00 Nick Coghlan <ncoghlan at gmail.com>:
>>
>>> On 21 July 2017 at 12:44, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> > We can separately measure the cost of unmarshalling the code object:
>>> >
>>> > $ python3 -m perf timeit -s "import typing; from marshal import loads;
>>> from
>>> > importlib.util import cache_from_source; cache =
>>> > cache_from_source(typing.__file__); data = open(cache,
>>> 'rb').read()[12:]"
>>> > "loads(data)"
>>> > .....................
>>> > Mean +- std dev: 286 us +- 4 us
>>>
>>> Slight adjustment here, as the cost of locating the cached bytecode
>>> and reading it from disk should really be accounted for in each
>>> iteration:
>>>
>>> $ python3 -m perf timeit -s "import typing; from marshal import loads;
>>> from importlib.util import cache_from_source" "cache =
>>> cache_from_source(typing.__spec__.origin); data = open(cache,
>>> 'rb').read()[12:]; loads(data)"
>>> .....................
>>> Mean +- std dev: 337 us +- 8 us
>>>
>>> That will have a bigger impact when loading from spinning disk or a
>>> network drive, but it's fairly negligible when loading from a local
>>> SSD or an already primed filesystem cache.
>>>
>>> Cheers,
>>> Nick.
>>>
>>> --
>>> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>>>
>> Thanks for your tests, Nick. It's quite evident that the marshal code
>> cannot improve the situation, so I regret from my proposal.
>>
>
> It was still a good suggestion, since it made me realise I *hadn't*
> actually measured the relative timings lately, so it was technically an
> untested assumption that module level code execution still dominated the
> overall import time.
>
> typing is also a particularly large & complex module, and bytecode
> unmarshalling represents a larger fraction of the import time for simpler
> modules like abc:
>
> $ python3 -m perf timeit -s "import abc; from marshal import loads; from
> importlib.util import cache_from_source" "cache =
> cache_from_source(abc.__spec__.origin); data = open(cache,
> 'rb').read()[12:]; loads(data)"
> .....................
> Mean +- std dev: 45.2 us +- 1.1 us
>
> $ python3 -m perf timeit -s "import abc; loader_exec =
> abc.__spec__.loader.exec_module" "loader_exec(abc)"
> .....................
> Mean +- std dev: 172 us +- 5 us
>
> $ python3 -m perf timeit -s "import abc; from importlib import reload"
> "reload(abc)"
> .....................
> Mean +- std dev: 280 us +- 14 us
>
> And _weakrefset:
>
> $ python3 -m perf timeit -s "import _weakrefset; from marshal import
> loads; from importlib.util import cache_from_source" "cache =
> cache_from_source(_weakrefset.__spec__.origin); data = open(cache,
> 'rb').read()[12:]; loads(data)"
> .....................
> Mean +- std dev: 57.7 us +- 1.3 us
>
> $ python3 -m perf timeit -s "import _weakrefset; loader_exec =
> _weakrefset.__spec__.loader.exec_module" "loader_exec(_weakrefset)"
> .....................
> Mean +- std dev: 129 us +- 6 us
>
> $ python3 -m perf timeit -s "import _weakrefset; from importlib import
> reload" "reload(_weakrefset)"
> .....................
> Mean +- std dev: 226 us +- 4 us
>
> The conclusion still holds (the absolute numbers here are likely still too
> small for the extra complexity of parallelising bytecode loading to pay off
> in any significant way), but it also helps us set reasonable expectations
> around how much of a gain we're likely to be able to get just from
> precompilation with Cython.
>
> That does actually raise a small microbenchmarking problem: for source and
> bytecode imports, we can force the import system to genuinely rerun the
> module or unmarshal the bytecode inside a single Python process, allowing
> perf to measure it independently of CPython startup. While I'm pretty sure
> it's possible to trick the import machinery into rerunning module level
> init functions even for old-style extension modules (hence allowing us to
> run similar tests to those above for a Cython compiled module), I don't
> actually remember how to do it off the top of my head.
>
> Cheers,
> Nick.
>
> P.S. I'll also note that in these cases where the import overhead is
> proportionally significant for always-imported modules, we may want to look
> at the benefits of freezing them (if they otherwise remain as pure Python
> modules), or compiling them as builtin modules (if we switch them over to
> Cython), in addition to looking at ways to make the modules themselves
> faster. Being built directly into the interpreter binary is pretty much the
> best case scenario for reducing import overhead.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> mertz%40gnosis.cx
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170721/9d78d3b6/attachment.html>


More information about the Python-Dev mailing list