python process creation overhead

I was investigating speeding up Mercurial's test suite (it runs ~13,000 Python processes) and I believe I've identified CPython process/interpreter creation and destruction as sources of significant overhead and thus a concern for any CPython user. Full details are at [1]. tl;dr 10-18% of CPU time in test suite execution was spent doing the equivalent of `python -c 1` and 30-38% of CPU time was spent doing the equivalent of `hg version` (I suspect this is mostly dominated by module importing - and Mercurial has lazy module importing). My measurements show CPython is significantly slower than Perl and Ruby when it comes to process/interpreter creation/destruction. Furthermore, Python 3 appears to be >50% slower than Python 2. Any system spawning many Python processes will likely feel this overhead. I'm also a contributor to Firefox's build and testing infrastructure. Those systems collectively spawn thousands of Python processes. While I haven't yet measured, I fear that CPython process overhead is also contributing to significant overhead there. While more science and knowledge should be added before definite conclusions are reached, I'd like to ring the figurative bell about this problem. This apparent inefficiency has me concerned about the use of CPython in scenarios where process spawning is a common operation and decent latency is important. This includes CLI tools, build systems, and testing. I'm curious if others feel this is a problem and what steps can be taken to mitigate or correct it. [1] http://www.selenic.com/pipermail/mercurial-devel/2014-May/058854.html Gregory Szorc gregory.szorc@gmail.com P.S. I'm not subscribed, so please CC me on responses.

Hello, On Sat, 10 May 2014 13:05:54 -0700 Gregory Szorc <gregory.szorc@gmail.com> wrote:
I was investigating speeding up Mercurial's test suite (it runs ~13,000 Python processes) and I believe I've identified CPython process/interpreter creation and destruction as sources of significant overhead and thus a concern for any CPython user.
This was recently discussed in https://mail.python.org/pipermail/python-dev/2014-April/134028.html To give numbers: $ time python -c pass real 0m0.012s user 0m0.008s sys 0m0.004s $ time python -c "from mercurial import demandimport; demandimport.enable(); from mercurial.dispatch import request, dispatch; dispatch(request(['version']))" Mercurial Distributed SCM (version 3.0-rc+2-d924e387604f) (see http://mercurial.selenic.com for more information) Copyright (C) 2005-2014 Matt Mackall and others This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. real 0m0.056s user 0m0.044s sys 0m0.012s So, even for "hg version", most of the time is spent initializing Mercurial and its modules, not initializing Python itself. In this precise measurement, an infinitely fast Python startup would make "hg version" 20% faster (and other hg commands probably even less, since they have to load more Mercurial code and data). I'm wondering why Mercurial couldn't have a transparent daemon mode, where each "hg" invocation on the CLI would only load whatever required to communicate with the persistent daemon process. It already has a commandserver.
P.S. I'm not subscribed, so please CC me on responses.
I would suggest following the list using the gmane NNTP gateway. Regards Antoine.

Le 10 mai 2014 22:51, "Gregory Szorc" <gregory.szorc@gmail.com> a écrit :
Furthermore, Python 3 appears to be >50% slower than Python 2.
Please mention the minor version. It looks like you compared 2.7 and 3.3. Please test 3.4, we made interesting progress on the startup time. There is still something to do, especially on OS X. Depending on the OS, different modules are loaded and some functions are implemented differently. Victor

On May 10, 2014, at 5:46 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Le 10 mai 2014 22:51, "Gregory Szorc" <gregory.szorc@gmail.com> a écrit :
Furthermore, Python 3 appears to be >50% slower than Python 2.
Please mention the minor version. It looks like you compared 2.7 and 3.3. Please test 3.4, we made interesting progress on the startup time.
There is still something to do, especially on OS X. Depending on the OS, different modules are loaded and some functions are implemented differently.
For what it's worth pip is the same way, about half of our test suite involves invoking (multiple) python processes. This has historically be really slow (~30 minutes to run ~200 tests of that type). We've been able to get the wall clock run time down by parallelizing these but the sequential time is still really slow. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Yeah, but 200 test in 30 minutes is 9 *seconds* per test -- the Python startup time is only a tiny fraction of that (20-40 *milliseconds*). On Sat, May 10, 2014 at 3:33 PM, Donald Stufft <donald@stufft.io> wrote:
On May 10, 2014, at 5:46 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Le 10 mai 2014 22:51, "Gregory Szorc" <gregory.szorc@gmail.com> a écrit :
Furthermore, Python 3 appears to be >50% slower than Python 2.
Please mention the minor version. It looks like you compared 2.7 and 3.3. Please test 3.4, we made interesting progress on the startup time.
There is still something to do, especially on OS X. Depending on the OS, different modules are loaded and some functions are implemented differently.
For what it's worth pip is the same way, about half of our test suite involves invoking (multiple) python processes. This has historically be really slow (~30 minutes to run ~200 tests of that type). We've been able to get the wall clock run time down by parallelizing these but the sequential time is still really slow.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)

Yes right, sorry I didn’t mean to imply that all that time was spent in the Python start up time. I’ve personally never actually spent time to figure out which part of that was slow because getting visibility inside of a subprocess.Popen is a pain and I’m slowly trying to rewrite our tests to not require subprocesses at all. I was only trying to chime in with a me too here about slow subprocess tests since our test suite ends up starting a couple thousand subprocesses as well and those tests are definitely our slowest. On May 10, 2014, at 6:50 PM, Guido van Rossum <guido@python.org> wrote:
Yeah, but 200 test in 30 minutes is 9 *seconds* per test -- the Python startup time is only a tiny fraction of that (20-40 *milliseconds*).
On Sat, May 10, 2014 at 3:33 PM, Donald Stufft <donald@stufft.io> wrote:
On May 10, 2014, at 5:46 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Le 10 mai 2014 22:51, "Gregory Szorc" <gregory.szorc@gmail.com> a écrit :
Furthermore, Python 3 appears to be >50% slower than Python 2.
Please mention the minor version. It looks like you compared 2.7 and 3.3. Please test 3.4, we made interesting progress on the startup time.
There is still something to do, especially on OS X. Depending on the OS, different modules are loaded and some functions are implemented differently.
For what it's worth pip is the same way, about half of our test suite involves invoking (multiple) python processes. This has historically be really slow (~30 minutes to run ~200 tests of that type). We've been able to get the wall clock run time down by parallelizing these but the sequential time is still really slow.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 5/10/2014 2:46 PM, Victor Stinner wrote:
Le 10 mai 2014 22:51, "Gregory Szorc" <gregory.szorc@gmail.com <mailto:gregory.szorc@gmail.com>> a écrit :
Furthermore, Python 3 appears to be >50% slower than Python 2.
Please mention the minor version. It looks like you compared 2.7 and 3.3. Please test 3.4, we made interesting progress on the startup time.
There is still something to do, especially on OS X. Depending on the OS, different modules are loaded and some functions are implemented differently.
3.4.0 does appear to be faster than 3.3.5 on Linux - `python -c ''` is taking ~50ms (as opposed to ~60ms) on my i7-2600K. Great to see! But 3.4.0 is still slower than 2.7.6. And all versions of CPython are over 3x slower than Perl 5.18.2. This difference amounts to minutes of CPU time when thousands of processes are involved. That seems excessive to me. Why can't Python start as quickly as Perl or Ruby?

On Mon, May 12, 2014 at 04:22:52PM -0700, Gregory Szorc wrote:
Why can't Python start as quickly as Perl or Ruby?
On my heavily abused Core 2 Macbook with 9 .pth files, 2.7 drops from 81ms startup to 20ms by specifying -S, which disables site.py. Oblitering the .pth files immediately knocks 10ms off regular startup. I guess the question isn't why Python is slower than perl, but what aspects of site.py could be cached, reimplemented, or stripped out entirely. I'd personally love to see .pth support removed. David

On 13 May 2014 10:19, <dw+python-dev@hmmz.org> wrote:
On Mon, May 12, 2014 at 04:22:52PM -0700, Gregory Szorc wrote:
Why can't Python start as quickly as Perl or Ruby?
On my heavily abused Core 2 Macbook with 9 .pth files, 2.7 drops from 81ms startup to 20ms by specifying -S, which disables site.py.
Oblitering the .pth files immediately knocks 10ms off regular startup. I guess the question isn't why Python is slower than perl, but what aspects of site.py could be cached, reimplemented, or stripped out entirely. I'd personally love to see .pth support removed.
The startup code is currently underspecified and underdocumented, and quite fragile as a result. It represents 20+ years of organic growth without any systematic refactoring to simplify and streamline things. That's what PEP 432 aims to address, and is something I now expect to have time to get back to for Python 3.5. And yes, one thing those changes should enable is the creation of system Python runtimes on Linux that initialise faster than the current implementation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 5/12/2014 8:43 PM, Nick Coghlan wrote:
On 13 May 2014 10:19, <dw+python-dev@hmmz.org> wrote:
On Mon, May 12, 2014 at 04:22:52PM -0700, Gregory Szorc wrote:
Why can't Python start as quickly as Perl or Ruby?
On my heavily abused Core 2 Macbook with 9 .pth files, 2.7 drops from 81ms startup to 20ms by specifying -S, which disables site.py.
Oblitering the .pth files immediately knocks 10ms off regular startup. I guess the question isn't why Python is slower than perl, but what aspects of site.py could be cached, reimplemented, or stripped out entirely. I'd personally love to see .pth support removed.
The startup code is currently underspecified and underdocumented, and quite fragile as a result. It represents 20+ years of organic growth without any systematic refactoring to simplify and streamline things.
That's what PEP 432 aims to address, and is something I now expect to have time to get back to for Python 3.5. And yes, one thing those changes should enable is the creation of system Python runtimes on Linux that initialise faster than the current implementation.
This is terrific news and something I greatly anticipate taking advantage of! But the great many of us still on 2.7 likely won't see a benefit, correct? How insane would it be for people to do things like pass -S in the shebang and manually implement the parts of site.py that are actually needed?

Gregory Szorc writes:
But the great many of us still on 2.7 likely won't see a benefit, correct? How insane would it be for people to do things like pass -S in the shebang and manually implement the parts of site.py that are actually needed?
Well, since it probably won't work ....<wink/> That is to say, site.py typically provides different facilities to different programs -- that's why some parts of it show up as unneeded. So you need to carefully analyze *each* *subprocess* that you propose to invoke with -S and determine which parts of site.py it needs. In most cases I suspect you would better look at alternatives that avoid invoking a subprocess per task, but instead maintains a pool of worker processes (or threads). You might even be able to save network traffic or IPC by caching replies to common requests in the worker processes, which may save more per task than process invocation. Even where -S makes sense, I would suggest invoking "python -S" explicitly from the parent process rather than munging the shebang in the children. (You do need to be careful to audit changes in the child programs to be sure they aren't changed in ways that change site.py requirements. Putting -S in the shebang may help catch such problems early.)
participants (8)
-
Antoine Pitrou
-
Donald Stufft
-
dw+python-dev@hmmz.org
-
Gregory Szorc
-
Guido van Rossum
-
Nick Coghlan
-
Stephen J. Turnbull
-
Victor Stinner