[issue28637] Python startup performance regression

Wolfgang Maier report at bugs.python.org
Tue Nov 8 07:08:24 EST 2016


Wolfgang Maier added the comment:

STINNER Victor added the comment:
>BUT when Python is started from a virtual environment (created by the
>"venv" module), the re module is important by default.
>
>haypo at speed-python$ venv/bin/python3 -c 'import sys; print("re" in sys.modules)'
>True

Exciting, I just verified that this is true and running python3 from a venv really seems to be the only situation, in which the re module gets imported during startup (at least it's only this one branch in site.py that uses it).

If adding a single enum import to re causes such a big startup time difference I wonder how much more could be gained for the venv case by not importing re at all!

Turns out that the complete code block in site.py that is used by venvs and that was partially shown by @haypo is:

CONFIG_LINE = r'^(?P<key>(\w|[-_])+)\s*=\s*(?P<value>.*)\s*$'

def venv(known_paths):
    global PREFIXES, ENABLE_USER_SITE

    env = os.environ
    if sys.platform == 'darwin' and '__PYVENV_LAUNCHER__' in env:
        executable = os.environ['__PYVENV_LAUNCHER__']
    else:
        executable = sys.executable
    exe_dir, _ = os.path.split(os.path.abspath(executable))
    site_prefix = os.path.dirname(exe_dir)
    sys._home = None
    conf_basename = 'pyvenv.cfg'
    candidate_confs = [
        conffile for conffile in (
            os.path.join(exe_dir, conf_basename),
            os.path.join(site_prefix, conf_basename)
            )
        if os.path.isfile(conffile)
        ]

    if candidate_confs:
        import re
        config_line = re.compile(CONFIG_LINE)
        virtual_conf = candidate_confs[0]
        system_site = "true"
        # Issue 25185: Use UTF-8, as that's what the venv module uses when
        # writing the file.
        with open(virtual_conf, encoding='utf-8') as f:
            for line in f:
                line = line.strip()
                m = config_line.match(line)
                if m:
                    d = m.groupdict()
                    key, value = d['key'].lower(), d['value']
                    if key == 'include-system-site-packages':
                        system_site = value.lower()
                    elif key == 'home':
                        sys._home = value

        sys.prefix = sys.exec_prefix = site_prefix

        # Doing this here ensures venv takes precedence over user-site
        addsitepackages(known_paths, [sys.prefix])

        # addsitepackages will process site_prefix again if its in PREFIXES,
        # but that's ok; known_paths will prevent anything being added twice
        if system_site == "true":
            PREFIXES.insert(0, sys.prefix)
        else:
            PREFIXES = [sys.prefix]
            ENABLE_USER_SITE = False

    return known_paths

So all the re module is good for here is to parse simple config file records with key/value pairs separated by '='. ´Shouldn't it be straightforward to implement that logic right inside that block directly without requiring a giant import?

This should easily be doable for 3.6 still, seems as if it would solve the whole issue and probably speed up the performance tests much more than any reverted changesets could.

What do you think?

----------
nosy: +wolma

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28637>
_______________________________________


More information about the Python-bugs-list mailing list