[Python-3000] Pre-PEP on fast imports

Phillip J. Eby pje at telecommunity.com
Tue Jun 12 18:30:46 CEST 2007


At 07:18 PM 6/11/2007 -0400, Phillip J. Eby wrote:
>The subclass might look something like this:
>
>      import imp, os, sys
>      from pkgutil import ImpImporter
>
>      suffixes = set(ext for ext,mode,typ in imp.get_suffixes())
>
>      class CachedImporter(ImpImporter):
>          def __init__(self, path):
>              if not os.path.isdir(path):
>                  raise ImportError("Not an existing directory")
>              super(CachedImporter, self).__init__(path)
>              self.refresh()
>
>          def refresh(self):
>              self.cache = set()
>              for fname in os.listdir(path):
>                  base, ext = os.path.splitext(fname)
>                  if ext in suffixes and '.' not in base:
>                      self.cache.add(base)
>
>          def find_module(self, fullname, path=None):
>              if fullname.split(".")[-1] not in self.cache:
>                  return None  # no need to check further
>              return super(CachedImporter, self).find_module(fullname, path)
>
>      sys.path_hooks.append(CachedImporter)

After a bit of reflection, it seems the refresh() method needs to be 
a bit different:

           def refresh(self):
               cache = set()
               for fname in os.listdir(self.path):
                   base, ext = os.path.splitext(fname)
                   if not ext or (ext in suffixes and '.' not in base):
                       cache.add(base)
               self.cache = cache

This version fixes two problems: first, a race condition could occur 
if you called refresh() while an import was taking place in another 
thread.  This version fixes that by only updating self.cache after 
the new cache is completely built.

Second, the old version didn't handle packages at all.  This version 
handles them by treating extension-less filenames as possible package 
directories.  I originally thought this should check for a 
subdirectory and __init__, but this could get very expensive if a 
sys.path directory has a lot of subdirectories (whether or not 
they're packages).  Having false positives in the cache (i.e. names 
that can't actually be imported) could slow things down a bit, but 
*only* if those names match something you're trying to import.  Thus, 
it seems like a reasonable trade-off versus needing to scan every 
subdirectory at startup or even to check whether all those names 
*are* subdirectories.



More information about the Python-3000 mailing list