[Python-3000] Pre-PEP on fast imports
brett at python.org
Tue Jun 12 03:53:43 CEST 2007
On 6/11/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 12:46 AM 6/12/2007 +0200, Giovanni Bajo wrote:
> >Hi Philip,
> >I'm going to submit a PEP for Python 3000 (and possibly backported
> >as an option off by default in Python 2). It's related to imports
> >and how to make them faster. Given your expertise on the subject,
> >I'd appreciate if you could review my ideas. I briefly spoken of it
> >with Alex Martelli a few days ago at PyCon Italia and he was not
> >negative about it.
> >- A single import causes many syscalls (.pyo, .pyc, .py, in both
> >directory and .zip file).
> >- Situation is getting worse and worse with the advent of
> >easy_install which produces many .pth files (longer sys.path).
> >- Python startup time is slow, and a noticable fraction of it is
> >dominated by site.py-related stuff (a simple hello world runs takes
> >0.012s if run without -S, and 0.008s if run with -S).
> >- Many people might not be interested in this, but others are really
> >concerned. Eg: again at PyCon italia, I spoke with one of the
> >leading Sugar programmers (OLPC) who told me that one of the biggest
> >blocker right now is the python startup time (applications on latest
> >OLPC prototype take 3-4 seconds to startup). He suggested that this
> >was related to the large number of syscalls made for imports.
> >Proposed solution:
> >- A site cache is introduced. It's a dictionary mapping module names
> >to absolute file paths.
> >- When an import occurs, for each directory/zipfile we walk in
> >sys.path, we read all directory entries, and update the site cache
> >with all the Python modules found in it (all the Python modules
> >found in the directory/zipfile).
> >- If the filepath for a certain module is found in the site cache,
> >the module is directly accessed. Otherwise, sys.path is walked.
> >- The site cache can be cleared with sys.clear_site_cache(). This
> >must be used after manual editing of sys.path (or could be done
> >automatically by making sys.path a list subclass which notices each
> >- The site cache must be manually cleared if a Python file is added
> >to a directory in sys.path after the application has started. This
> >is a rare-enough scenario to require an additional explicit call.
> >- If for whatever reason a filepath found in the site cache cannot
> >be accessed (unmounted device, whatever) ImportError is raised.
> >Again, this is something which is very rare and does not require
> >much attention.
> Here's a simpler solution, one that's easily testable using existing
> Python versions. Create a subclass of pkgutil.ImpImporter
> (Python >=2.5) that caches a listdir of its contents, and uses it to
> immediately reject any find_module() requests for which matching data
> is not in its cached listdir. Add this class to sys.path_hooks, and
> see if it speeds things up.
I thought about this use case when writing importlib for lowering the
penalty of importing over NFS and this is exactly how I would do it as well
(except I would use the code from importlib instead of pkgutil =).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-3000