<br><br><div><span class="gmail_quote">On 6/11/07, <b class="gmail_sendername">Phillip J. Eby</b> <<a href="mailto:pje@telecommunity.com">pje@telecommunity.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
At 12:46 AM 6/12/2007 +0200, Giovanni Bajo wrote:<br>>Hi Philip,<br>><br>>I'm going to submit a PEP for Python 3000 (and possibly backported<br>>as an option off by default in Python 2). It's related to imports
<br>>and how to make them faster. Given your expertise on the subject,<br>>I'd appreciate if you could review my ideas. I briefly spoken of it<br>>with Alex Martelli a few days ago at PyCon Italia and he was not
<br>>negative about it.<br>><br>>Problems:<br>><br>>- A single import causes many syscalls (.pyo, .pyc, .py, in both<br>>directory and .zip file).<br>>- Situation is getting worse and worse with the advent of
<br>>easy_install which produces many .pth files (longer sys.path).<br>>- Python startup time is slow, and a noticable fraction of it is<br>>dominated by site.py-related stuff (a simple hello world runs takes<br>
>0.012s if run without -S, and 0.008s if run with -S).<br>>- Many people might not be interested in this, but others are really<br>>concerned. Eg: again at PyCon italia, I spoke with one of the<br>>leading Sugar programmers (OLPC) who told me that one of the biggest
<br>>blocker right now is the python startup time (applications on latest<br>>OLPC prototype take 3-4 seconds to startup). He suggested that this<br>>was related to the large number of syscalls made for imports.<br>
><br>><br>>Proposed solution:<br>><br>>- A site cache is introduced. It's a dictionary mapping module names<br>>to absolute file paths.<br>>- When an import occurs, for each directory/zipfile we walk in
<br>>sys.path, we read all directory entries, and update the site cache<br>>with all the Python modules found in it (all the Python modules<br>>found in the directory/zipfile).<br>>- If the filepath for a certain module is found in the site cache,
<br>>the module is directly accessed. Otherwise, sys.path is walked.<br>>- The site cache can be cleared with sys.clear_site_cache(). This<br>>must be used after manual editing of sys.path (or could be done<br>>automatically by making
sys.path a list subclass which notices each<br>>modification).<br>>- The site cache must be manually cleared if a Python file is added<br>>to a directory in sys.path after the application has started. This<br>>is a rare-enough scenario to require an additional explicit call.
<br>>- If for whatever reason a filepath found in the site cache cannot<br>>be accessed (unmounted device, whatever) ImportError is raised.<br>>Again, this is something which is very rare and does not require<br>
>much attention.<br><br>Here's a simpler solution, one that's easily testable using existing<br>Python versions. Create a subclass of pkgutil.ImpImporter<br>(Python >=2.5) that caches a listdir of its contents, and uses it to
<br>immediately reject any find_module() requests for which matching data<br>is not in its cached listdir. Add this class to sys.path_hooks, and<br>see if it speeds things up.</blockquote><div><br><br>I thought about this use case when writing importlib for lowering the penalty of importing over NFS and this is exactly how I would do it as well (except I would use the code from importlib instead of pkgutil =).
<br></div><br>-Brett<br></div>