[Python-3000] Pre-PEP on fast imports
Giovanni Bajo
rasky at develer.com
Tue Jun 12 18:40:01 CEST 2007
On 6/12/2007 6:30 PM, Phillip J. Eby wrote:
>> import imp, os, sys
>> from pkgutil import ImpImporter
>>
>> suffixes = set(ext for ext,mode,typ in imp.get_suffixes())
>>
>> class CachedImporter(ImpImporter):
>> def __init__(self, path):
>> if not os.path.isdir(path):
>> raise ImportError("Not an existing directory")
>> super(CachedImporter, self).__init__(path)
>> self.refresh()
>>
>> def refresh(self):
>> self.cache = set()
>> for fname in os.listdir(path):
>> base, ext = os.path.splitext(fname)
>> if ext in suffixes and '.' not in base:
>> self.cache.add(base)
>>
>> def find_module(self, fullname, path=None):
>> if fullname.split(".")[-1] not in self.cache:
>> return None # no need to check further
>> return super(CachedImporter, self).find_module(fullname,
>> path)
>>
>> sys.path_hooks.append(CachedImporter)
>
> After a bit of reflection, it seems the refresh() method needs to be a
> bit different:
>
> def refresh(self):
> cache = set()
> for fname in os.listdir(self.path):
> base, ext = os.path.splitext(fname)
> if not ext or (ext in suffixes and '.' not in base):
> cache.add(base)
> self.cache = cache
>
> This version fixes two problems: first, a race condition could occur if
> you called refresh() while an import was taking place in another
> thread. This version fixes that by only updating self.cache after the
> new cache is completely built.
>
> Second, the old version didn't handle packages at all. This version
> handles them by treating extension-less filenames as possible package
> directories. I originally thought this should check for a subdirectory
> and __init__, but this could get very expensive if a sys.path directory
> has a lot of subdirectories (whether or not they're packages). Having
> false positives in the cache (i.e. names that can't actually be
> imported) could slow things down a bit, but *only* if those names match
> something you're trying to import. Thus, it seems like a reasonable
> trade-off versus needing to scan every subdirectory at startup or even
> to check whether all those names *are* subdirectories.
There is another couple of things I'll fix as soon as I try it. First is
that I'd call refresh() lazily on the first find_module because I don't
want to listdir() directories on sys.path that will never be accessed.
The idea of using sys.path_hooks is very clever (I hadn't thought of
it... because I didn't know of path_hooks in the first place! It appears
to be undocumented and sparsely indexed by google as well), and it will
probably help me a lot in my task of fixing this problem in the 2.x serie.
--
Giovanni Bajo
More information about the Python-3000
mailing list