[Python-3000] Pre-PEP on fast imports
Brett Cannon
brett at python.org
Thu Jun 14 06:01:43 CEST 2007
On 6/12/07, Giovanni Bajo <rasky at develer.com> wrote:
>
> On 6/12/2007 6:30 PM, Phillip J. Eby wrote:
>
> >> import imp, os, sys
> >> from pkgutil import ImpImporter
> >>
> >> suffixes = set(ext for ext,mode,typ in imp.get_suffixes())
> >>
> >> class CachedImporter(ImpImporter):
> >> def __init__(self, path):
> >> if not os.path.isdir(path):
> >> raise ImportError("Not an existing directory")
> >> super(CachedImporter, self).__init__(path)
> >> self.refresh()
> >>
> >> def refresh(self):
> >> self.cache = set()
> >> for fname in os.listdir(path):
> >> base, ext = os.path.splitext(fname)
> >> if ext in suffixes and '.' not in base:
> >> self.cache.add(base)
> >>
> >> def find_module(self, fullname, path=None):
> >> if fullname.split(".")[-1] not in self.cache:
> >> return None # no need to check further
> >> return super(CachedImporter, self).find_module(fullname,
> >> path)
> >>
> >> sys.path_hooks.append(CachedImporter)
> >
> > After a bit of reflection, it seems the refresh() method needs to be a
> > bit different:
> >
> > def refresh(self):
> > cache = set()
> > for fname in os.listdir(self.path):
> > base, ext = os.path.splitext(fname)
> > if not ext or (ext in suffixes and '.' not in base):
> > cache.add(base)
> > self.cache = cache
> >
> > This version fixes two problems: first, a race condition could occur if
> > you called refresh() while an import was taking place in another
> > thread. This version fixes that by only updating self.cache after the
> > new cache is completely built.
> >
> > Second, the old version didn't handle packages at all. This version
> > handles them by treating extension-less filenames as possible package
> > directories. I originally thought this should check for a subdirectory
> > and __init__, but this could get very expensive if a sys.path directory
> > has a lot of subdirectories (whether or not they're packages). Having
> > false positives in the cache (i.e. names that can't actually be
> > imported) could slow things down a bit, but *only* if those names match
> > something you're trying to import. Thus, it seems like a reasonable
> > trade-off versus needing to scan every subdirectory at startup or even
> > to check whether all those names *are* subdirectories.
>
> There is another couple of things I'll fix as soon as I try it. First is
> that I'd call refresh() lazily on the first find_module because I don't
> want to listdir() directories on sys.path that will never be accessed.
>
> The idea of using sys.path_hooks is very clever (I hadn't thought of
> it... because I didn't know of path_hooks in the first place! It appears
> to be undocumented and sparsely indexed by google as well), and it will
> probably help me a lot in my task of fixing this problem in the 2.x serie.
PEP 302 documents all of this, but unfortunately was never documented in the
official docs.
I also have some pseudocode of how import (roughly) works at
sandbox/trunk/import_in_py/pseudocode.py .
-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070613/8998fc87/attachment.htm
More information about the Python-3000
mailing list