[Python-Dev] Rethinking intern() and its data structure

Robert Collins robert.collins at canonical.com
Fri Apr 10 11:19:39 CEST 2009

On Thu, 2009-04-09 at 21:26 -0700, Guido van Rossum wrote:

> Just to add some skepticism, has anyone done any kind of
> instrumentation of bzr start-up behavior?

We sure have. 'bzr --profile-imports' reports on the time to import
different modules (both cumulative and individually).

We have a lazy module loader that allows us to defer loading modules we
might not use (though if they are needed we are in fact going to pay for
loading them eventually).

We monkeypatch the standard library where modules we want are
unreasonably expensive to import (for instance by making a regex we
wouldn't use be lazy compiled rather than compiled at import time).

>   IIRC every time I was asked
> to reduce the start-up cost of some Python app, the cause was too many
> imports, and the solution was either to speed up import itself (.pyc
> files were the first thing ever that came out of that -- importing
> from a single .zip file is one of the more recent tricks) or to reduce
> the number of modules imported at start-up (or both :-). Heavy-weight
> frameworks are usually the root cause, but usually there's nothing
> that can be done about that by the time you've reached this point. So,
> amen on the good luck, but please start with a bit of analysis.

Certainly, import time is part of it:
robertc at lifeless-64:~$ python -m timeit -s 'import sys;  import
bzrlib.errors' "del sys.modules['bzrlib.errors']; import bzrlib.errors"
10 loops, best of 3: 18.7 msec per loop

(errors.py is 3027 lines long with 347 exception classes).

We've also looked lower - python does a lot of stat operations search
for imports and determining if the pyc is up to date; these appear to
only really matter on cold-cache imports (but they matter a lot then);
in hot-cache situations they are insignificant.

Uhm, there's probably more - but I just wanted to note that we have done
quite a bit of analysis. I think a large chunk of our problem is having
too much code loaded when only a small fraction will be used in any one
operation. Consider importing bzrlib errors - 10% of the startup time
for 'bzr help'. In any operation only a few of those exceptions will be
used - and typically 0.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/d0d2126f/attachment.pgp>

More information about the Python-Dev mailing list