To reduce Python "application" startup time
Hi, While I can't attend to sprint, I saw etherpad and I found Neil Schemenauer and Eric Snow will work on startup time. I want to share my current knowledge about startup time. For bare (e.g. `python -c pass`) startup time, I'm waiting C implementation of ABC. But application startup time is more important. And we can improve them with optimize importing common stdlib. Current `python -v` is not useful to optimize import. So I use this patch to profile import time. https://gist.github.com/methane/e688bb31a23bcc437defcea4b815b1eb With this profile, I tried optimize `python -c 'import asyncio'`, logging and http.client. https://gist.github.com/methane/1ab97181e74a33592314c7619bf34233#file-0-opti... With this small patch: logging: 14.9ms -> 12.9ms asyncio: 62.1ms -> 58.2ms http.client: 43.8ms -> 36.1ms I haven't created pull request yet. (Can I create without issue, as trivial patch?) I'm very busy these days, maybe until December. But I hope this report helps people working on optimizing startup time. Regards, INADA Naoki <songofacandy@gmail.com>
On 5 September 2017 at 15:02, INADA Naoki <songofacandy@gmail.com> wrote:
Hi,
[...]
For bare (e.g. `python -c pass`) startup time, I'm waiting C implementation of ABC.
Hi, I am not sure I will be able to finish it this week, also this depends on fixing interactions with ABC caches in ``typing`` first (as I mentioned on b.p.o., currently ``typing`` "aggressively" uses private ABC API). -- Ivan
On 9/5/2017 9:02 AM, INADA Naoki wrote:
But application startup time is more important. And we can improve them with optimize importing common stdlib.
Current `python -v` is not useful to optimize import. So I use this patch to profile import time. https://gist.github.com/methane/e688bb31a23bcc437defcea4b815b1eb
With this profile, I tried optimize `python -c 'import asyncio'`, logging and http.client.
https://gist.github.com/methane/1ab97181e74a33592314c7619bf34233#file-0-opti...
With this small patch:
logging: 14.9ms -> 12.9ms asyncio: 62.1ms -> 58.2ms http.client: 43.8ms -> 36.1ms
I haven't created pull request yet. (Can I create without issue, as trivial patch?)
Trivial, no-issue PRs are meant for things like typo fixes that need no discussion or record. Moving imports in violation of the PEP 8 rule, "Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants", is not trivial. Doing so voluntarily for speed, as opposed to doing so necessarily to avoid circular import errors, is controversial. -- Terry Jan Reedy
I haven't created pull request yet. (Can I create without issue, as trivial patch?)
Trivial, no-issue PRs are meant for things like typo fixes that need no discussion or record.
Moving imports in violation of the PEP 8 rule, "Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants", is not trivial. Doing so voluntarily for speed, as opposed to doing so necessarily to avoid circular import errors, is controversial.
-- Terry Jan Reedy
Make sense. I'll create issues for each module if it seems really worth enough. Thanks,
2017-09-05 6:02 GMT-07:00 INADA Naoki <songofacandy@gmail.com>:
Hi,
While I can't attend to sprint, I saw etherpad and I found Neil Schemenauer and Eric Snow will work on startup time.
I want to share my current knowledge about startup time.
For bare (e.g. `python -c pass`) startup time, I'm waiting C implementation of ABC.
But application startup time is more important. And we can improve them with optimize importing common stdlib.
Current `python -v` is not useful to optimize import. So I use this patch to profile import time. https://gist.github.com/methane/e688bb31a23bcc437defcea4b815b1eb
With this profile, I tried optimize `python -c 'import asyncio'`, logging and http.client.
https://gist.github.com/methane/1ab97181e74a33592314c7619bf342 33#file-0-optimize-import-patch
This patch moves a few imports inside functions. I wonder whether that kind of change actually helps with real applications—doesn't any real application end up importing the socket module anyway at some point?
With this small patch:
logging: 14.9ms -> 12.9ms asyncio: 62.1ms -> 58.2ms http.client: 43.8ms -> 36.1ms
I haven't created pull request yet. (Can I create without issue, as trivial patch?)
I'm very busy these days, maybe until December. But I hope this report helps people working on optimizing startup time.
Regards,
INADA Naoki <songofacandy@gmail.com> _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ jelle.zijlstra%40gmail.com
On Tue, Sep 5, 2017 at 11:13 AM, Jelle Zijlstra <jelle.zijlstra@gmail.com> wrote:
2017-09-05 6:02 GMT-07:00 INADA Naoki <songofacandy@gmail.com>:
With this profile, I tried optimize `python -c 'import asyncio'`, logging and http.client.
https://gist.github.com/methane/1ab97181e74a33592314c7619bf34233#file-0-opti...
This patch moves a few imports inside functions. I wonder whether that kind of change actually helps with real applications—doesn't any real application end up importing the socket module anyway at some point?
I don't know if this particular change is worthwhile, but one place where startup slowness is particularly noticed is with commands like 'foo.py --help' or 'foo.py --generate-completions' (the latter called implicitly by hitting <tab> in some shell), which typically do lots of imports that end up not being used. -n -- Nathaniel J. Smith -- https://vorpus.org
This patch moves a few imports inside functions. I wonder whether that kind of change actually helps with real applications—doesn't any real application end up importing the socket module anyway at some point?
I don't know if this particular change is worthwhile, but one place where startup slowness is particularly noticed is with commands like 'foo.py --help' or 'foo.py --generate-completions' (the latter called implicitly by hitting <tab> in some shell), which typically do lots of imports that end up not being used.
Yes. And There are more worse scenario. 1. Jinja2 supports asyncio. So it imports asyncio. 2. asyncio imports concurrent.futures, for compatibility with Future class. 3. concurrent.futures package does `from concurrent.futures.process import ProcessPoolExecutor` 4. concurrent.futures.process package imports multiprocessing. So when I use Jinja2 but not asyncio or multiprocessing, I need to import large dependency tree. I want to make `import asyncio` dependency tree smaller. FYI, current version of Jinja2 has very large regex which took more than 100ms when import time. It is fixed in master branch. So if you try to see Jinja2, please use master branch. Regrads,
On Wed, Sep 6, 2017 at 2:30 PM, INADA Naoki <songofacandy@gmail.com> wrote:
This patch moves a few imports inside functions. I wonder whether that kind of change actually helps with real applications—doesn't any real application end up importing the socket module anyway at some point?
I don't know if this particular change is worthwhile, but one place where startup slowness is particularly noticed is with commands like 'foo.py --help' or 'foo.py --generate-completions' (the latter called implicitly by hitting <tab> in some shell), which typically do lots of imports that end up not being used.
Yes. And There are more worse scenario.
1. Jinja2 supports asyncio. So it imports asyncio. 2. asyncio imports concurrent.futures, for compatibility with Future class. 3. concurrent.futures package does `from concurrent.futures.process import ProcessPoolExecutor` 4. concurrent.futures.process package imports multiprocessing.
So when I use Jinja2 but not asyncio or multiprocessing, I need to import large dependency tree. I want to make `import asyncio` dependency tree smaller.
FYI, current version of Jinja2 has very large regex which took more than 100ms when import time. It is fixed in master branch. So if you try to see Jinja2, please use master branch.
How significant is application startup time to something that uses Jinja2? Are there short-lived programs that use it? Python startup time matters enormously to command-line tools like Mercurial, but far less to something that's designed to start up and then keep running (eg a web app, which is where Jinja is most used). ChrisA
How significant is application startup time to something that uses Jinja2? Are there short-lived programs that use it? Python startup time matters enormously to command-line tools like Mercurial, but far less to something that's designed to start up and then keep running (eg a web app, which is where Jinja is most used).
Since Jinja2 is very popular template engine, it is used by CLI tools like ansible. Additionally, faster startup time (and smaller memory footprint) is good for even Web applications. For example, CGI is still comfortable tool sometimes. Another example is GAE/Python. Anyway, I think researching import tree of popular library is good startline about optimizing startup time. For example, modules like ast and tokenize are imported often than I thought. Jinja2 is one of libraries I often use. I'm checking other libraries like requests. Thanks, INADA Naoki <songofacandy@gmail.com>
On Wednesday, September 6, 2017, INADA Naoki <songofacandy@gmail.com> wrote:
How significant is application startup time to something that uses Jinja2? Are there short-lived programs that use it? Python startup time matters enormously to command-line tools like Mercurial, but far less to something that's designed to start up and then keep running (eg a web app, which is where Jinja is most used).
Since Jinja2 is very popular template engine, it is used by CLI tools like ansible.
SaltStack uses Jinja2. It really is a good idea to regularly restart the minion processes. Celery can also cycle through worker processes, IIRC.
Additionally, faster startup time (and smaller memory footprint) is good for even Web applications. For example, CGI is still comfortable tool sometimes. Another example is GAE/Python.
Short-lived processes are sometimes preferable from a security standpoint. Python is currently less viable for CGI use than other scripting languages due to startup time. Resource leaks (e.g. memory, file handles, database references; valgrind) do not last w/ short-lived CGI processes. If there's ASLR, that's also harder. Scale up operations with e.g. IaaS platforms like Kubernetes and PaaS platforms like AppScale all incur Python startup time on a regular basis.
Anyway, I think researching import tree of popular library is good startline about optimizing startup time. For example, modules like ast and tokenize are imported often than I thought.
Jinja2 is one of libraries I often use. I'm checking other libraries like requests.
Thanks,
INADA Naoki <songofacandy@gmail.com <javascript:;>> _______________________________________________ Python-Dev mailing list Python-Dev@python.org <javascript:;> https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ wes.turner%40gmail.com
Anyway, I think researching import tree of popular library is good startline about optimizing startup time.
I agree -- in this case, you've identified that asyncio is expensive -- good to know. In the jinja2 case, does it always need asyncio? Pep8 as side, I think it often makes sense for expensive optional imports to be done only if needed. Perhaps a patch to jinja2 is in order. CHB For example, modules like ast and tokenize are imported often than I
thought.
Jinja2 is one of libraries I often use. I'm checking other libraries like requests.
Thanks,
INADA Naoki <songofacandy@gmail.com <javascript:;>> _______________________________________________ Python-Dev mailing list Python-Dev@python.org <javascript:;> https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ wes.turner%40gmail.com
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
On Sep 6, 2017, at 00:42, INADA Naoki <songofacandy@gmail.com> wrote:
Additionally, faster startup time (and smaller memory footprint) is good for even Web applications. For example, CGI is still comfortable tool sometimes. Another example is GAE/Python.
Anyway, I think researching import tree of popular library is good startline about optimizing startup time. For example, modules like ast and tokenize are imported often than I thought.
Improving start up time may indeed help long running processes but start up costs will generally be amortized across the lifetime of the process, so it isn’t as noticeable. However, startup time *is* a real issue for command line tools. I’m not sure however whether burying imports inside functions (as a kind of poor man’s lazy import) is ultimately going to be satisfying. First, it’s not natural, it generally violates coding standards (e.g. PEP 8), and can make linters complain. Second, I think you’ll end up chasing imports all over the stdlib and third party modules in any sufficiently complicated application. Third, I’m not sure that the gains you’ll get won’t just be overwhelmed by lots of other things going on, such as pkg_resources entry point processing, pth file processing, site.py effects, command line processing libraries such as click, and implicitly added distribution exception hooks (e.g. Ubuntu’s apport). Many of these can’t be blamed on Python itself, but all can contribute significantly to Python’s apparent start up time. It’s definitely worth investigating the details of Python import, and a few of us at the core sprint have looked at those numbers and thrown around ideas for improvement, but we’ll need to look at the effects up and down the stack to improve the start up performance for the average Python application. Cheers, -Barry
I’m not sure however whether burying imports inside functions (as a kind of poor man’s lazy import) is ultimately going to be satisfying. First, it’s not natural, it generally violates coding standards (e.g. PEP 8), and can make linters complain.
Of course, I tried to move imports only when (1) it's only used one or two of many functions in the module, (2) it's relatively heavy, (3) it's rerely imported from other modules.
Second, I think you’ll end up chasing imports all over the stdlib and third party modules in any sufficiently complicated application.
Agree. I won't use much time to optimization by moving import from top to inner function in stdlib. I think my import-profiler patch can be polished and committed in Python to help library maintainers to know import time easily. (Maybe, `python -X import-profile`)
Third, I’m not sure that the gains you’ll get won’t just be overwhelmed by lots of other things going on, such as pkg_resources entry point processing, pth file processing, site.py effects, command line processing libraries such as click, and implicitly added distribution exception hooks (e.g. Ubuntu’s apport).
Yes. I noticed some of them while profiling imports. For example, old-style namespace package imports types module for types.Module. Types module imports functools, and functools imports collections. So some efforts in CPython (Avoid importing collections and functools from site) is not worth enough when there are at least one old-style namespace package is installed.
Many of these can’t be blamed on Python itself, but all can contribute significantly to Python’s apparent start up time. It’s definitely worth investigating the details of Python import, and a few of us at the core sprint have looked at those numbers and thrown around ideas for improvement, but we’ll need to look at the effects up and down the stack to improve the start up performance for the average Python application.
Yes. I totally agree with you. That's why I use import-profile.patch for some 3rd party libraries. Currently, I have these ideas to optimize application startup time. * Faster, or lazily compiling regular expression. (pkg_resources imports pyparsing, which has lot regex) * More usable lazy import. (which can be solved "PEP 549: Instance Properties (aka: module properties)") * Optimize enum creation. * Faster namedtuple (There is pull request already) * Faster ABC * Breaking large import tree in stdlib. (PEP 549 may help this too) Regards, INADA Naoki <songofacandy@gmail.com>
I should mention that I have a prototype design for improving importlib's lazy loading to be easier to turn on and use. See https://notebooks.azure.com/Brett/libraries/di2Btqj7zSI/html/Lazy%20importin... for my current notes. Part of it includes an explicit lazy_import() function which would negate needing to hide imports in functions to delay their importation. On Wed, 6 Sep 2017 at 20:50 INADA Naoki <songofacandy@gmail.com> wrote:
I’m not sure however whether burying imports inside functions (as a kind of poor man’s lazy import) is ultimately going to be satisfying. First, it’s not natural, it generally violates coding standards (e.g. PEP 8), and can make linters complain.
Of course, I tried to move imports only when (1) it's only used one or two of many functions in the module, (2) it's relatively heavy, (3) it's rerely imported from other modules.
Second, I think you’ll end up chasing imports all over the stdlib and third party modules in any sufficiently complicated application.
Agree. I won't use much time to optimization by moving import from top to inner function in stdlib.
I think my import-profiler patch can be polished and committed in Python to help library maintainers to know import time easily. (Maybe, `python -X import-profile`)
Third, I’m not sure that the gains you’ll get won’t just be overwhelmed by lots of other things going on, such as pkg_resources entry point processing, pth file processing, site.py effects, command line processing libraries such as click, and implicitly added distribution exception hooks (e.g. Ubuntu’s apport).
Yes. I noticed some of them while profiling imports. For example, old-style namespace package imports types module for types.Module. Types module imports functools, and functools imports collections. So some efforts in CPython (Avoid importing collections and functools from site) is not worth enough when there are at least one old-style namespace package is installed.
Many of these can’t be blamed on Python itself, but all can contribute
significantly to Python’s apparent start up time. It’s definitely worth investigating the details of Python import, and a few of us at the core sprint have looked at those numbers and thrown around ideas for improvement, but we’ll need to look at the effects up and down the stack to improve the start up performance for the average Python application.
Yes. I totally agree with you. That's why I use import-profile.patch for some 3rd party libraries.
Currently, I have these ideas to optimize application startup time.
* Faster, or lazily compiling regular expression. (pkg_resources imports pyparsing, which has lot regex) * More usable lazy import. (which can be solved "PEP 549: Instance Properties (aka: module properties)") * Optimize enum creation. * Faster namedtuple (There is pull request already) * Faster ABC * Breaking large import tree in stdlib. (PEP 549 may help this too)
Regards,
INADA Naoki <songofacandy@gmail.com> _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
With this profile, I tried optimize `python -c 'import asyncio'`, logging and http.client.
https://gist.github.com/methane/1ab97181e74a33592314c7619bf34233#file-0-opti...
This patch moves a few imports inside functions. I wonder whether that kind of change actually helps with real applications—doesn't any real application end up importing the socket module anyway at some point?
Ah, I'm sorry. It doesn't importing asyncio, logging and http.client faster. I saw pkg_resources. While it's not stdlib, it is imported very often. And it uses email.parser, but doesn't require socket or random. Since socket module creates some enums, removing it reduces few milliseconds. Regards,
INADA Naoki <songofacandy@gmail.com> wrote:
Current `python -v` is not useful to optimize import. So I use this patch to profile import time. https://gist.github.com/methane/e688bb31a23bcc437defcea4b815b1eb
I have implemented DTrace probes that do almost the same thing. Your patch is better in that it does not require an OS with DTrace or SystemTap. The DTrace probes are better in that they can be a part of the standard Python build. https://github.com/nascheme/cpython/tree/dtrace-module-import DTrace script: https://gist.github.com/nascheme/c1cece36a3369926ee93cecc3d024179 Pretty printer for script output (very minimal): https://gist.github.com/nascheme/0bff5c49bb6b518f5ce23a9aea27f14b
05.09.17 16:02, INADA Naoki пише:
While I can't attend to sprint, I saw etherpad and I found Neil Schemenauer and Eric Snow will work on startup time.
I want to share my current knowledge about startup time.
For bare (e.g. `python -c pass`) startup time, I'm waiting C implementation of ABC.
But application startup time is more important. And we can improve them with optimize importing common stdlib.
Current `python -v` is not useful to optimize import. So I use this patch to profile import time. https://gist.github.com/methane/e688bb31a23bcc437defcea4b815b1eb
With this profile, I tried optimize `python -c 'import asyncio'`, logging and http.client.
https://gist.github.com/methane/1ab97181e74a33592314c7619bf34233#file-0-opti...
With this small patch:
logging: 14.9ms -> 12.9ms asyncio: 62.1ms -> 58.2ms http.client: 43.8ms -> 36.1ms
See also https://bugs.python.org/issue30152 which optimizes the import time of argparse using similar technique. I think these patches overlap.
participants (12)
-
Barry Warsaw
-
Brett Cannon
-
Chris Angelico
-
Chris Barker - NOAA Federal
-
INADA Naoki
-
Ivan Levkivskyi
-
Jelle Zijlstra
-
Nathaniel Smith
-
Neil Schemenauer
-
Serhiy Storchaka
-
Terry Reedy
-
Wes Turner