Re: [Python-Dev] Implementing PEP 382, Namespace Packages
At 03:44 PM 5/29/2010 -0700, Brett Cannon wrote:
On Sat, May 29, 2010 at 12:29, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Am 29.05.2010 21:06, schrieb P.J. Eby:
At 08:45 PM 5/29/2010 +0200, Martin v. Löwis wrote:
In it he says that PEP 382 is being deferred until it can address PEP 302 loaders. I can't find any follow-up to this. I don't see any discussion in PEP 382 about PEP 302 loaders, so I assume this issue was never resolved. Does it need to be before PEP 382 is implemented? Are we wasting our time by designing and (eventually) coding before this issue is resolved?
Yes, and yes.
Is there anything we can do to help regarding that?
You could comment on the proposal I made back then, or propose a different solution.
[sorry for the fundamental PEP questions, but I think PEP 382 came about while I was on my python-dev sabbatical last year]
I have some questions about the PEP which might help clarify how to handle the API changes.
For finders, their search algorithm is changed in a couple of ways. One is that modules are given priority over packages (is that intentional, Martin, or just an oversight?). Two, the package search requires checking for a .pth file on top of an __init__.py. This will change finders that could before simply do an existence check on an __init__ "file" (or whatever the storage back-end happened to be) and make it into a list-and-search which one would hope wasn't costly, but in same cases might be if the paths to files is not stored in a hierarchical fashion (e.g. zip files list entire files paths in their TOC or a sqlite3 DB which uses a path for keys will have to list **all** keys, sort them to just the relevant directory, and then look for .pth or some such approach). Are we worried about possible performance implications of this search?
No. First, an importer would not be required to implement it in a precisely analagous way; you could have database entries or a special consolidated index in a zipfile, if you wanted to do it like that. (In practice, Python's zipimporter has a memory cache of the TOC, and a simple database index on paths would make a search for .pth's in a subdirectory trivial for the database case.)
I say no, but I just want to make sure people we are not and people are aware about the design shift required in finders. This entire worry would be alleviated if only .pth files named after the package were supported, much like *.pkg files in pkgutil.
Which would completely break one of the major use cases of the PEP, which is precisely to ensure that you can install two pieces of code to the same namespace without either one overwriting the other's files.
And then the search for the __init__.py begins on the newly modified __path__, which I assume ends with the first __init__ found on __path__, but if no file is found it's okay and essentially an empty module with just module-specific attributes is used? In other words, can a .pth file replace an __init__ file in delineating a package?
Yes.
Or is it purely additive? I assume the latter for compatibility reasons,
Nope. The idea is specifically to allow separately installed projects to create a package without overwriting any files (causing conflicts for system installers).
but the PEP says "a directory is considered a package if it **either** contains a file named __init__.py, **or** a file whose name ends with ".pth"" (emphasis mine). Otherwise I assume that the search will be done simply with ``os.path.isdir(os.path.join(sys_path_entry, top_level_package_name)`` and all existing paths will be added to __path__. Will they come before or after the directory where the *.pth was found? And will any subsequent *.pth files found in other directories also be executed?
As for how "*" works, is this limited to top-level packages, or will sub-packages participate as well?
Sub-packages as well.
I assume the former, but it is not directly stated in the PEP. If the latter, is a dotted package name changed to ``os.sep.join(sy_path_entry, package_name.replace('".", os.sep)``?
For sys.path_hooks, I am assuming import will simply skip over passing that as it is a marker that __path__ represents a namsepace package and not in any way functional. Although with sys.namespace_packages, is leaving the "*" in __path__ truly necessary?
I'm going to leave these to Martin to answer.
For the search of paths to use to extend, are we limiting ourselves to actual file system entries on sys.path (as pkgutil does),
pkgutil doesn't have such a limitation, except in the case extend_path, and that limitation is one that PEP 382 intends to remove.
or do we want to support other storage back-ends? To do the latter I would suggest having a successful path discovery be when a finder can be created for the hypothetical directory from sys.path_hooks.
The downside to that is that NullImporter is the default importer, so you'd still have to special case it. It would make more sense to add to the PEP 302 protocols directly.
I'll shut up now and stop causing trouble. =)
May I suggest you take a look at the implementation draft in my other email? I realize in retrospect it doesn't handle __init__ searching in precisely the order proposed by the PEP, but I'm not sure it would be that difficult to add. (It also needs to split the operation into find/load pieces, but that's also a straightforward mod: just defer the module loading until the end, and return a wrapper around the loader that finishes the process.)
On Sun, May 30, 2010 at 00:40, P.J. Eby <pje@telecommunity.com> wrote:
At 03:44 PM 5/29/2010 -0700, Brett Cannon wrote:
On Sat, May 29, 2010 at 12:29, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Am 29.05.2010 21:06, schrieb P.J. Eby:
At 08:45 PM 5/29/2010 +0200, Martin v. Löwis wrote:
In it he says that PEP 382 is being deferred until it can address PEP 302 loaders. I can't find any follow-up to this. I don't see any discussion in PEP 382 about PEP 302 loaders, so I assume this issue was never resolved. Does it need to be before PEP 382 is implemented? Are we wasting our time by designing and (eventually) coding before this issue is resolved?
Yes, and yes.
Is there anything we can do to help regarding that?
You could comment on the proposal I made back then, or propose a different solution.
[sorry for the fundamental PEP questions, but I think PEP 382 came about while I was on my python-dev sabbatical last year]
I have some questions about the PEP which might help clarify how to handle the API changes.
For finders, their search algorithm is changed in a couple of ways. One is that modules are given priority over packages (is that intentional, Martin, or just an oversight?). Two, the package search requires checking for a .pth file on top of an __init__.py. This will change finders that could before simply do an existence check on an __init__ "file" (or whatever the storage back-end happened to be) and make it into a list-and-search which one would hope wasn't costly, but in same cases might be if the paths to files is not stored in a hierarchical fashion (e.g. zip files list entire files paths in their TOC or a sqlite3 DB which uses a path for keys will have to list **all** keys, sort them to just the relevant directory, and then look for .pth or some such approach). Are we worried about possible performance implications of this search?
No. First, an importer would not be required to implement it in a precisely analagous way; you could have database entries or a special consolidated index in a zipfile, if you wanted to do it like that. (In practice, Python's zipimporter has a memory cache of the TOC, and a simple database index on paths would make a search for .pth's in a subdirectory trivial for the database case.)
Sure, for the two examples this works, but who knows about other odd back-ends people might be using. Granted, this is all hypothetical and why I figured we wouldn't worry about it.
I say no, but I just want to make sure people we are not and people are aware about the design shift required in finders. This entire worry would be alleviated if only .pth files named after the package were supported, much like *.pkg files in pkgutil.
Which would completely break one of the major use cases of the PEP, which is precisely to ensure that you can install two pieces of code to the same namespace without either one overwriting the other's files.
The PEP says the goal is to span packages across directories. If you split something like zope into multiple directories, does having a separate zope.pth file in each of those directories really cause a problem here? You are not importing them so it isn't like you are worrying about precedence. If you specify that all .pth files found are run then using the same file name in all package directories isn't an issue. But I guess packages that do this want to keep unique files per directory separation that they support and not have to fix the file names at distribution time.
And then the search for the __init__.py begins on the newly modified __path__, which I assume ends with the first __init__ found on __path__, but if no file is found it's okay and essentially an empty module with just module-specific attributes is used? In other words, can a .pth file replace an __init__ file in delineating a package?
Yes.
Or is it purely additive? I assume the latter for compatibility reasons,
Nope. The idea is specifically to allow separately installed projects to create a package without overwriting any files (causing conflicts for system installers).
but the PEP says "a directory is considered a package if it **either** contains a file named __init__.py, **or** a file whose name ends with ".pth"" (emphasis mine). Otherwise I assume that the search will be done simply with ``os.path.isdir(os.path.join(sys_path_entry, top_level_package_name)`` and all existing paths will be added to __path__. Will they come before or after the directory where the *.pth was found? And will any subsequent *.pth files found in other directories also be executed?
As for how "*" works, is this limited to top-level packages, or will sub-packages participate as well?
Sub-packages as well.
I assume the former, but it is not directly stated in the PEP. If the latter, is a dotted package name changed to ``os.sep.join(sy_path_entry, package_name.replace('".", os.sep)``?
For sys.path_hooks, I am assuming import will simply skip over passing that as it is a marker that __path__ represents a namsepace package and not in any way functional. Although with sys.namespace_packages, is leaving the "*" in __path__ truly necessary?
I'm going to leave these to Martin to answer.
For the search of paths to use to extend, are we limiting ourselves to actual file system entries on sys.path (as pkgutil does),
pkgutil doesn't have such a limitation, except in the case extend_path, and that limitation is one that PEP 382 intends to remove.
It's because pkgutil.extend_path has that limitation I am asking as that's what the PEP refers to. If the PEP wants to remove the limitation it should clearly state how it is going to do that.
or do we want to support other storage back-ends? To do the latter I would suggest having a successful path discovery be when a finder can be created for the hypothetical directory from sys.path_hooks.
The downside to that is that NullImporter is the default importer, so you'd still have to special case it. It would make more sense to add to the PEP 302 protocols directly.
But import is the one that adds NullImporter. And if import is the one figuring out what paths do and don't work, then it doesn't matter about NullImporter as it will know what does and does not fail. And as you said, you can special-case it. As for adding to the PEP 302 protocols, it's a question of how much we want importer implementors to have control over this versus us. I personally would rather keep any protocol extensions simple and have import handle as many of the details as possible. I think the PEP 3147 has shown the benefits of letting import details be under our control as much as possible can be beneficial as it doesn't put pre-existing importers at a disadvantage.
I'll shut up now and stop causing trouble. =)
May I suggest you take a look at the implementation draft in my other email? I realize in retrospect it doesn't handle __init__ searching in precisely the order proposed by the PEP, but I'm not sure it would be that difficult to add. (It also needs to split the operation into find/load pieces, but that's also a straightforward mod: just defer the module loading until the end, and return a wrapper around the loader that finishes the process.)
I replied separately to that email.
The PEP says the goal is to span packages across directories. If you split something like zope into multiple directories, does having a separate zope.pth file in each of those directories really cause a problem here?
I think pje already answered this: yes, you do split zope into multiple packages. But then, you may install all of them into the same folder. This has caused pain for Linux package management tools in particular, because they dislike if to packages install the same file. So if you can arrange the pth files to have non-overlapping names, you can still install them into the same directory, and dpkg is happy. Regards, Martin
participants (3)
-
"Martin v. Löwis"
-
Brett Cannon
-
P.J. Eby