[Python-Dev] Implementing PEP 382, Namespace Packages

Mon May 31 22:10:45 CEST 2010

On Mon, May 31, 2010 at 00:53, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> For finders, their search algorithm is changed in a couple of ways.
>> One is that modules are given priority over packages (is that
>> intentional, Martin, or just an oversight?).
>
> That's an oversight. Notice, however, that it's really not the case that
> currently directories have precedence over modules, either: if a directory
> is later on the __path__ than a module, it's still the module that gets
> imported. So the precedence takes place only when a module and a directory
> exist in the same directory.
>
> In any case, I have now fixed it.
>
>> Two, the package search
>> requires checking for a .pth file on top of an __init__.py. This will
>> change finders that could before simply do an existence check on an
>> __init__ "file"
>
> You are reading something into the PEP that isn't there yet. PEP 302
> currently isn't considered, and the question of this discussion is precisely
> how the loaders API should be changed.
>
>> (or whatever the storage back-end happened to be) and
>> make it into a list-and-search which one would hope wasn't costly, but
>> in same cases might be if the paths to files is not stored in a
>> hierarchical fashion (e.g. zip files list entire files paths in their
>> TOC or a sqlite3 DB which uses a path for keys will have to list
>> **all** keys, sort them to just the relevant directory, and then look
>> for .pth or some such approach).
>
> First, I think it's up to the specific loader mechanism whether
> PEP 382 should be supported at all. It should be possible to implement
> it if desired, but if it's not feasible (e.g. for URL loaders), pth
> files just may not get considered. The loader may well provide a
> different mechanism to support namespace packages.
>
>> Are we worried about possible
>>
>> performance implications of this search?
>
> For the specific case of zip files, I'm not. I don't think performance will
> suffer at all.
>
>> And then the search for the __init__.py begins on the newly modified
>> __path__, which I assume ends with the first __init__ found on
>> __path__, but if no file is found it's okay and essentially an empty
>> module with just module-specific attributes is used?
>
> Correct.
>
>> In other words,
>> can a .pth file replace an __init__ file in delineating a package?
>
> That's what it means by '''a directory is considered a package if it either
> contains a file named __init__.py, or a file whose name ends with ".pth".'''
>
>> Or
>> is it purely additive? I assume the latter for compatibility reasons,
>> but the PEP says "a directory is considered a package if it **either**
>> contains a file named __init__.py, **or** a file whose name ends with
>> ".pth"" (emphasis mine).
>
> Why do you think this causes an incompatibility?

It's just if a project has no __init__ older finders won't process it,
that's all. But it looks like they are going to have to change
somewhat anyway so that's not an issue.

>
>> Otherwise I assume that the search will be
>> done simply with ``os.path.isdir(os.path.join(sys_path_entry,
>> top_level_package_name)`` and all existing paths will be added to
>> __path__. Will they come before or after the directory where the *.pth
>> was found? And will any subsequent *.pth files found in other
>> directories also be executed?
>
> I may misremember, but from reading the text, it seems to say "no".
> It should work like the current pth mechanism (plus *, minus import).
>
>> As for how "*" works, is this limited to top-level packages, or will
>> sub-packages participate as well? I assume the former, but it is not
>> directly stated in the PEP.
>
> And indeed, the latter is intended. You should be able to create namespace
> packages on all levels.
>
>> If the latter, is a dotted package name
>> changed to ``os.sep.join(sy_path_entry, package_name.replace('".",
>> os.sep)``?
>
> No. Instead, the parent package's __path__ is being searched for
> directories; sys.path is not considered anymore. I have fixed the text.
>
>> For sys.path_hooks, I am assuming import will simply skip over passing
>> that as it is a marker that __path__ represents a namsepace package
>> and not in any way functional. Although with sys.namespace_packages,
>> is leaving the "*" in __path__ truly necessary?
>
> It would help with efficiency, no?

Not sure how having "*" in __path__ helps with efficiency. What are
you thinking it will help with specifically?

>
>> For the search of paths to use to extend, are we limiting ourselves to
>> actual file system entries on sys.path (as pkgutil does), or do we
>> want to support other storage back-ends? To do the latter I would
>> suggest having a successful path discovery be when a finder can be
>> created for the hypothetical directory from sys.path_hooks.
>
> Again: PEP 302 isn't really considered yet. Proposals are welcome.
>
>> The PEP (seems to) ask finders to look for a .pth file(s), calculate
>> __path__, and then get a loader for the __init__. You could have
>> finders grow a find_namespace method which returns the contents of the
>> requisite .pth file(s).
>
> I must be misunderstanding the concept of finders. Why is it that it would
> be their function to process the pth files, and not the function of the
> loader?

I'm thinking from the perspective of finding an __init__ module that
exists somewhere else than where the .pth file was discovered. Someone
has to find the .pth files and provide their contents. Someone else
needs to find the __init__ module for the package (if it exists). Then
someone needs to load a namespace package, potentially from the
__init__ module. It's that second step -- find the __init__ module --
that makes me think the finder is more involved. It doesn't have to be
by definition, but seeing the word "find" just makes me think
"finder".