[Python-3000] Pre-PEP on fast imports
Phillip J. Eby
pje at telecommunity.com
Tue Jun 12 18:30:46 CEST 2007
At 07:18 PM 6/11/2007 -0400, Phillip J. Eby wrote:
>The subclass might look something like this:
>
> import imp, os, sys
> from pkgutil import ImpImporter
>
> suffixes = set(ext for ext,mode,typ in imp.get_suffixes())
>
> class CachedImporter(ImpImporter):
> def __init__(self, path):
> if not os.path.isdir(path):
> raise ImportError("Not an existing directory")
> super(CachedImporter, self).__init__(path)
> self.refresh()
>
> def refresh(self):
> self.cache = set()
> for fname in os.listdir(path):
> base, ext = os.path.splitext(fname)
> if ext in suffixes and '.' not in base:
> self.cache.add(base)
>
> def find_module(self, fullname, path=None):
> if fullname.split(".")[-1] not in self.cache:
> return None # no need to check further
> return super(CachedImporter, self).find_module(fullname, path)
>
> sys.path_hooks.append(CachedImporter)
After a bit of reflection, it seems the refresh() method needs to be
a bit different:
def refresh(self):
cache = set()
for fname in os.listdir(self.path):
base, ext = os.path.splitext(fname)
if not ext or (ext in suffixes and '.' not in base):
cache.add(base)
self.cache = cache
This version fixes two problems: first, a race condition could occur
if you called refresh() while an import was taking place in another
thread. This version fixes that by only updating self.cache after
the new cache is completely built.
Second, the old version didn't handle packages at all. This version
handles them by treating extension-less filenames as possible package
directories. I originally thought this should check for a
subdirectory and __init__, but this could get very expensive if a
sys.path directory has a lot of subdirectories (whether or not
they're packages). Having false positives in the cache (i.e. names
that can't actually be imported) could slow things down a bit, but
*only* if those names match something you're trying to import. Thus,
it seems like a reasonable trade-off versus needing to scan every
subdirectory at startup or even to check whether all those names
*are* subdirectories.
More information about the Python-3000
mailing list