[Python-Dev] Implementing PEP 382, Namespace Packages

Mon May 31 02:59:57 CEST 2010

On Sat, May 29, 2010 at 15:56, P.J. Eby <pje at telecommunity.com> wrote:
> At 09:29 PM 5/29/2010 +0200, Martin v. Löwis wrote:
>>
>> Am 29.05.2010 21:06, schrieb P.J. Eby:
>>>
>>> At 08:45 PM 5/29/2010 +0200, Martin v. Löwis wrote:
>>>>>
>>>>> In it he says that PEP 382 is being deferred until it can address PEP
>>>>> 302 loaders. I can't find any follow-up to this. I don't see any
>>>>> discussion in PEP 382 about PEP 302 loaders, so I assume this issue was
>>>>> never resolved. Does it need to be before PEP 382 is implemented? Are
>>>>> we
>>>>> wasting our time by designing and (eventually) coding before this issue
>>>>> is resolved?
>>>>
>>>> Yes, and yes.
>>>
>>> Is there anything we can do to help regarding that?
>>
>> You could comment on the proposal I made back then, or propose a different
>> solution.
>
> Looking at that proposal, I don't follow how changing *loaders* (vs.
> importers) would help.  If an importer's find_module doesn't natively
> support PEP 382, then there's no way to get a loader for the package in the
> first place.  Today, namespace packages work fine with PEP 302 loaders,
> because the namespace-ness is really only about setting up the __path__, and
> detecting that you need to do this in the first place.
>
> In the PEP 302 scheme, then, it's either importers that have to change, or
> the process that invokes them.  Being able to ask an importer the
> equivalents of os.path.join, listdir, and get_data would suffice to make an
> import process that could do the trick.
>
> Essentially, you'd ask each importer to first attempt to find the module,
> and then asking it (or the loader, if the find worked) whether
> packagename/*.pth exists, and then processing their contents.
>
> I don't think there's a need to have a special method for executing a
> package __init__, since what you'd do in the case where there are .pth but
> no __init__, is to simply continue the search to the end of sys.path (or the
> parent package __path__), and *then* create the module with an appropriate
> __path__.
>
> If at any point the find_module() call succeeds, then subsequent importers
> will just be asked for .pth files, which can then be processed into the
> __path__ of the now-loaded module.
>
> IOW, something like this (very rough draft):
>
>    pth_contents = []
>    module = None
>
>    for pathitem in syspath_or_parent__path__:
>
>        importer = pkgutil.get_importer(pathitem)
>        if importer is None:
>            continue
>
>        if module is None:
>            try:
>                loader = importer.find_module(fullname)
>            except ImportError:
>                pass
>            else:
>                # errors here should propagate
>                module = loader.load_module(fullname)
>                if not hasattr(module, '__path__'):
>                    # found, but not a package
>                    return module
>
>        pc = get_pth_contents(importer)
>        if pc is not None:
>            subpath = os.path.join(pathitem, modulebasename)
>            pth_contents.append(subpath)
>            pth_contents.extend(pc)
>            if '*' not in pth_contents:
>                # got a package, but not a namespace
>                break
>
>    if pth_contents:
>        if module is None:
>            # No __init__, but we have paths, so make an empty package
>            module = # new module object w/empty __path__
>        modify__path__(module, pth_contents)
>
>    return module
>

Is it wise to modify __path__ post-import? Today people can make sure
that __path__ is set to what they want before potentially reading it
in their __init__ module by making the pkgutil.extend_path() call
first. This would actually defer to after the import and thus not
allow any __init__ code to rely on what __path__ eventually becomes.

> Obviously, the details are all in the 'get_pth_contents()', and
> 'modify__path__()' functions, and the above process would do extra work in
> the case where an individual importer implements PEP 382 on its own
> (although why would it?).
>
> It's also the case that this algorithm will be slow to fail imports when
> implemented as a meta_path hook, since it will be doing an extra pass over
> sys.path or the parent __path__, in addition to the one that's done by the
> normal __import__ machinery.  (Though that's not an issue for Python 3.x,
> since this can be built into the core __import__).
>
> (Technically, the 3.x version should probably ask meta_path hooks for their
> .pth files as well, but I'm not entirely sure that that's a meaningful thing
> to ask.)
>
> The PEP 302 questions all boil down to how get_pth_contents() is
> implemented, and whether 'subpath' really should be created with
> os.path.join.  Simply adding a get_pth_contents() method to the importer
> protocol (that returns None or a list of lines), and maybe a
> get_subpath(modulename) method that returns the path string that should be
> used for a subdirectory importer (i.e. __path__ entry), or None if no such
> subpath exists.

Code already out there uses os.path.join() to extend __path__ (e.g.
Django), so I would stick with that unless we want to start
transitioning to '/' only.