On Wed, May 2, 2018, 4:53 AM Gregory Szorc <gregory.szorc@gmail.com> wrote:
On 7/19/2017 12:15 PM, Larry Hastings wrote:
> On 07/19/2017 05:59 AM, Victor Stinner wrote:
>> Mercurial startup time is already 45.8x slower than Git whereas tested
>> Mercurial runs on Python 2.7.12. Now try to sell Python 3 to Mercurial
>> developers, with a startup time 2x - 3x slower...
> When Matt Mackall spoke at the Python Language Summit some years back, I
> recall that he specifically complained about Python startup time.  He
> said Python 3 "didn't solve any problems for [them]"--they'd already
> solved their Unicode hygiene problems--and that Python's slow startup
> time was already a big problem for them.  Python 3 being /even slower/
> to start was absolutely one of the reasons why they didn't want to upgrade.
> You might think "what's a few milliseconds matter".  But if you run
> hundreds of commands in a shell script it adds up.  git's speed is one
> of the few bright spots in its UX, and hg's comparative slowness here is
> a palpable disadvantage.
>> So please continue efforts for make Python startup even faster to beat
>> all other programming languages, and finally convince Mercurial to
>> upgrade ;-)
> I believe Mercurial is, finally, slowly porting to Python 3.
>     https://www.mercurial-scm.org/wiki/Python3
> Nevertheless, I can't really be annoyed or upset at them moving slowly
> to adopt Python 3, as Matt's objections were entirely legitimate.

I just now found found this thread when searching the archive for
threads about startup time. And I was searching for threads about
startup time because Mercurial's startup time has been getting slower
over the past few months and this is causing substantial pain.

As I posted back in 2014 [1], CPython's startup overhead was >10% of the
total CPU time in Mercurial's test suite. And when you factor in the
time to import modules that get Mercurial to a point where it can run
commands, it was more like 30%!

Mercurial's full test suite currently runs `hg` ~25,000 times. Using
Victor's startup time numbers of 6.4ms for 2.7 and 14.5ms for
3.7/master, Python startup overhead contributes ~160s on 2.7 and ~360s
on 3.7/master. Even if you divide this by the number of available CPU
cores, we're talking dozens of seconds of wall time just waiting for
CPython to get to a place where Mercurial's first bytecode can execute.

And the problem is worse when you factor in the time it takes to import
Mercurial's own modules.

As a concrete example, I recently landed a Mercurial patch [2] that
stubs out zope.interface to prevent the import of 9 modules on every
`hg` invocation. This "only" saved ~6.94ms for a typical `hg`
invocation. But this decreased the CPU time required to run the test
suite on my i7-6700K from ~4450s to ~3980s (~89.5% of original) - a
reduction of almost 8 minutes of CPU time (and over 1 minute of wall time)!

By the time CPython gets Mercurial to a point where we can run useful
code, we've already blown most of or past the time budget where humans
perceive an action/command as instantaneous. If you ignore startup
overhead, Mercurial's performance compares quite well to Git's for many
operations. But the reality is that CPython startup overhead makes it
look like Mercurial is non-instantaneous before Mercurial even has the
opportunity to execute meaningful code!

Mercurial provides a `chg` program that essentially spins up a daemon
`hg` process running a "command server" so the `chg` program [written in
C - no startup overhead] can dispatch commands to an already-running
Python/`hg` process and avoid paying the startup overhead cost. When you
run Mercurial's test suite using `chg`, it completes *minutes* faster.
`chg` exists mainly as a workaround for slow startup overhead.

Changing gears, my day job is maintaining Firefox's build system. We use
Python heavily in the build system. And again, Python startup overhead
is problematic. I don't have numbers offhand, but we invoke likely a few
hundred Python processes as part of building Firefox. It should be
several thousand. But, we've had to "hack" parts of the build system to
"batch" certain build actions in single process invocations in order to
avoid Python startup overhead. This undermines the ability of some build
tools to formulate a reasonable understanding of the DAG and it causes a
bit of pain for build system developers and makes it difficult to
achieve "no-op" and fast incremental builds because we're always
invoking certain Python processes because we've had to move DAG
awareness out of the build backend and into Python. At some point, we'll
likely replace Python code with Rust so the build system is more "pure"
and easier to maintain and reason about.

I've seen posts in this thread and elsewhere in the CPython development
universe that challenge whether milliseconds in startup time matter.
Speaking as a Mercurial and Firefox build system developer,
*milliseconds absolutely matter*. Going further, *fractions of
milliseconds matter*. For Mercurial's test suite with its ~25,000 Python
process invocations, 1ms translates to ~25s of CPU time. With 2.7,
Mercurial can dispatch commands in ~50ms. When you load common
extensions, it isn't uncommon to see process startup overhead of
100-150ms! A millisecond here. A millisecond there. Before you know it,
we're talking *minutes* of CPU (and potentially wall) time in order to
run Mercurial's test suite (or build Firefox, or ...).

>From my perspective, Python process startup and module import overhead
is a severe problem for Python. I don't say this lightly, but in my mind
the problem causes me to question the viability of Python for popular
use cases, such as CLI applications. When choosing a programming
language, I want one that will scale as a project grows. Vanilla process
overhead has Python starting off significantly slower than compiled code
(or even Perl) and adding module import overhead into the mix makes
Python slower and slower as projects grow. As someone who has to deal
with this slowness on a daily basis, I can tell you that it is extremely
frustrating and it does matter. I hope that the importance of the
problem will be acknowledged (milliseconds *do* matter) and that
creative minds will band together to address it. Since I am
disproportionately impacted by this issue, if there's anything I can do
to help, let me know

Is your Python interpreter statically linked? The Python 3 ones from the anaconda distribution (use Miniconda!) are for Linux and macOS and that roughly halved our startup times.