Re: [Python-Dev] Implementing PEP 382, Namespace Packages
Am 29.05.2010 21:06, schrieb P.J. Eby:
At 08:45 PM 5/29/2010 +0200, Martin v. Löwis wrote:
In it he says that PEP 382 is being deferred until it can address PEP 302 loaders. I can't find any follow-up to this. I don't see any discussion in PEP 382 about PEP 302 loaders, so I assume this issue was never resolved. Does it need to be before PEP 382 is implemented? Are we wasting our time by designing and (eventually) coding before this issue is resolved?
Yes, and yes.
Is there anything we can do to help regarding that?
You could comment on the proposal I made back then, or propose a different solution. Regards, Martin
On Sat, May 29, 2010 at 12:29, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Am 29.05.2010 21:06, schrieb P.J. Eby:
At 08:45 PM 5/29/2010 +0200, Martin v. Löwis wrote:
In it he says that PEP 382 is being deferred until it can address PEP 302 loaders. I can't find any follow-up to this. I don't see any discussion in PEP 382 about PEP 302 loaders, so I assume this issue was never resolved. Does it need to be before PEP 382 is implemented? Are we wasting our time by designing and (eventually) coding before this issue is resolved?
Yes, and yes.
Is there anything we can do to help regarding that?
You could comment on the proposal I made back then, or propose a different solution.
[sorry for the fundamental PEP questions, but I think PEP 382 came about while I was on my python-dev sabbatical last year] I have some questions about the PEP which might help clarify how to handle the API changes. For finders, their search algorithm is changed in a couple of ways. One is that modules are given priority over packages (is that intentional, Martin, or just an oversight?). Two, the package search requires checking for a .pth file on top of an __init__.py. This will change finders that could before simply do an existence check on an __init__ "file" (or whatever the storage back-end happened to be) and make it into a list-and-search which one would hope wasn't costly, but in same cases might be if the paths to files is not stored in a hierarchical fashion (e.g. zip files list entire files paths in their TOC or a sqlite3 DB which uses a path for keys will have to list **all** keys, sort them to just the relevant directory, and then look for .pth or some such approach). Are we worried about possible performance implications of this search? I say no, but I just want to make sure people we are not and people are aware about the design shift required in finders. This entire worry would be alleviated if only .pth files named after the package were supported, much like *.pkg files in pkgutil. And then the search for the __init__.py begins on the newly modified __path__, which I assume ends with the first __init__ found on __path__, but if no file is found it's okay and essentially an empty module with just module-specific attributes is used? In other words, can a .pth file replace an __init__ file in delineating a package? Or is it purely additive? I assume the latter for compatibility reasons, but the PEP says "a directory is considered a package if it **either** contains a file named __init__.py, **or** a file whose name ends with ".pth"" (emphasis mine). Otherwise I assume that the search will be done simply with ``os.path.isdir(os.path.join(sys_path_entry, top_level_package_name)`` and all existing paths will be added to __path__. Will they come before or after the directory where the *.pth was found? And will any subsequent *.pth files found in other directories also be executed? As for how "*" works, is this limited to top-level packages, or will sub-packages participate as well? I assume the former, but it is not directly stated in the PEP. If the latter, is a dotted package name changed to ``os.sep.join(sy_path_entry, package_name.replace('".", os.sep)``? For sys.path_hooks, I am assuming import will simply skip over passing that as it is a marker that __path__ represents a namsepace package and not in any way functional. Although with sys.namespace_packages, is leaving the "*" in __path__ truly necessary? For the search of paths to use to extend, are we limiting ourselves to actual file system entries on sys.path (as pkgutil does), or do we want to support other storage back-ends? To do the latter I would suggest having a successful path discovery be when a finder can be created for the hypothetical directory from sys.path_hooks. OK, I *think* that's all of my clarification questions when it comes to the PEP. =) Now, on to API discussion. The PEP (seems to) ask finders to look for a .pth file(s), calculate __path__, and then get a loader for the __init__. You could have finders grow a find_namespace method which returns the contents of the requisite .pth file(s). Import could then take that, calculate __path__, and then use that new search path to find a loader for the __init__ (I am assuming there is an __init__ file somewhere). That's straight-forward and makes supporting .pth files additive for finders. The trick then becomes how the heck you get the new __path__ value into the module through the loader as up to this point it has calculated __path__ on its own. You could slightly abuse load_module's semantics for reloading and stick the namespace module into sys.modules before calling the loader for __init__ and change the semantics definition such that if __path__ is already defined you don't change it. Unfortunately that seems rather messy in the face of reloads that want a fresh __path__. Another possibility is to have the loader add the new paths, but to provide the calculated value of __path__ be stored on sys.namespace_packages. That way the loader can simply calculate its own version and extend it with what the dictionary provides. This allows loaders get what import thinks __path__ should be and they still have a chance to tweak things. If you want even more abstraction I would change is_package to return what __path__ should be when it is a package and provide an ABC that does the proper calculation of the extended __path__ value for is_package() so they can do ``return [extras].extend(super().is_package())`` for packages. But unfortunately, because load_module is overloaded with responsibilities, there is no way to dynamically add support for any of this to existing loaders like there is with finders (unless we factor out the responsibilities of load_module so it isn't so overworked and is entirely optional to implement, but that goes beyond this PEP's scope). There is also the issue of reloading with this delineation of work since finders are not necessarily called by imp.reload. Otherwise the loaders will have to recalculate everything import calculates in order to find the __init__ module to begin with. The only other option I can think of is to tweak find_module to always take a path argument, not just meta path finders. Then the calculated __path__ value can be passed in through find_module and thus passed on to the loader through a constructor or some such. That doesn't duplicate the work of calculating the extended __path__ value in both the finder or loader, nor having to cache it somewhere outside of the importer's reach where it might go stale. The finder simply passed the __path__ value on to the loader however it wants (most likely through a constructor call or internal caching). This also acts as a performance perk when searching for the __init__ module as having the 'path' argument set can act as a flag to not look for a module but only a package. This would make it no longer an additive feature to finders, but wouldn't require anything to change in loaders directly. I'll shut up now and stop causing trouble. =)
At 09:29 PM 5/29/2010 +0200, Martin v. Löwis wrote:
Am 29.05.2010 21:06, schrieb P.J. Eby:
At 08:45 PM 5/29/2010 +0200, Martin v. Löwis wrote:
In it he says that PEP 382 is being deferred until it can address PEP 302 loaders. I can't find any follow-up to this. I don't see any discussion in PEP 382 about PEP 302 loaders, so I assume this issue was never resolved. Does it need to be before PEP 382 is implemented? Are we wasting our time by designing and (eventually) coding before this issue is resolved?
Yes, and yes.
Is there anything we can do to help regarding that?
You could comment on the proposal I made back then, or propose a different solution.
Looking at that proposal, I don't follow how changing *loaders* (vs. importers) would help. If an importer's find_module doesn't natively support PEP 382, then there's no way to get a loader for the package in the first place. Today, namespace packages work fine with PEP 302 loaders, because the namespace-ness is really only about setting up the __path__, and detecting that you need to do this in the first place. In the PEP 302 scheme, then, it's either importers that have to change, or the process that invokes them. Being able to ask an importer the equivalents of os.path.join, listdir, and get_data would suffice to make an import process that could do the trick. Essentially, you'd ask each importer to first attempt to find the module, and then asking it (or the loader, if the find worked) whether packagename/*.pth exists, and then processing their contents. I don't think there's a need to have a special method for executing a package __init__, since what you'd do in the case where there are .pth but no __init__, is to simply continue the search to the end of sys.path (or the parent package __path__), and *then* create the module with an appropriate __path__. If at any point the find_module() call succeeds, then subsequent importers will just be asked for .pth files, which can then be processed into the __path__ of the now-loaded module. IOW, something like this (very rough draft): pth_contents = [] module = None for pathitem in syspath_or_parent__path__: importer = pkgutil.get_importer(pathitem) if importer is None: continue if module is None: try: loader = importer.find_module(fullname) except ImportError: pass else: # errors here should propagate module = loader.load_module(fullname) if not hasattr(module, '__path__'): # found, but not a package return module pc = get_pth_contents(importer) if pc is not None: subpath = os.path.join(pathitem, modulebasename) pth_contents.append(subpath) pth_contents.extend(pc) if '*' not in pth_contents: # got a package, but not a namespace break if pth_contents: if module is None: # No __init__, but we have paths, so make an empty package module = # new module object w/empty __path__ modify__path__(module, pth_contents) return module Obviously, the details are all in the 'get_pth_contents()', and 'modify__path__()' functions, and the above process would do extra work in the case where an individual importer implements PEP 382 on its own (although why would it?). It's also the case that this algorithm will be slow to fail imports when implemented as a meta_path hook, since it will be doing an extra pass over sys.path or the parent __path__, in addition to the one that's done by the normal __import__ machinery. (Though that's not an issue for Python 3.x, since this can be built into the core __import__). (Technically, the 3.x version should probably ask meta_path hooks for their .pth files as well, but I'm not entirely sure that that's a meaningful thing to ask.) The PEP 302 questions all boil down to how get_pth_contents() is implemented, and whether 'subpath' really should be created with os.path.join. Simply adding a get_pth_contents() method to the importer protocol (that returns None or a list of lines), and maybe a get_subpath(modulename) method that returns the path string that should be used for a subdirectory importer (i.e. __path__ entry), or None if no such subpath exists. Adding finer-grained methods is probably a waste of time, as there aren't likely to be many use cases for asking an *importer* to fetch files (vs. a loader). (In my case, of course, I'd use the pkgutil-style approach of augmenting importers or loaders that don't natively implement a needed method, that still allows third parties to register their own support for a fourth party's loader or importer type.)
On Sat, May 29, 2010 at 15:56, P.J. Eby <pje@telecommunity.com> wrote:
At 09:29 PM 5/29/2010 +0200, Martin v. Löwis wrote:
Am 29.05.2010 21:06, schrieb P.J. Eby:
At 08:45 PM 5/29/2010 +0200, Martin v. Löwis wrote:
In it he says that PEP 382 is being deferred until it can address PEP 302 loaders. I can't find any follow-up to this. I don't see any discussion in PEP 382 about PEP 302 loaders, so I assume this issue was never resolved. Does it need to be before PEP 382 is implemented? Are we wasting our time by designing and (eventually) coding before this issue is resolved?
Yes, and yes.
Is there anything we can do to help regarding that?
You could comment on the proposal I made back then, or propose a different solution.
Looking at that proposal, I don't follow how changing *loaders* (vs. importers) would help. If an importer's find_module doesn't natively support PEP 382, then there's no way to get a loader for the package in the first place. Today, namespace packages work fine with PEP 302 loaders, because the namespace-ness is really only about setting up the __path__, and detecting that you need to do this in the first place.
In the PEP 302 scheme, then, it's either importers that have to change, or the process that invokes them. Being able to ask an importer the equivalents of os.path.join, listdir, and get_data would suffice to make an import process that could do the trick.
Essentially, you'd ask each importer to first attempt to find the module, and then asking it (or the loader, if the find worked) whether packagename/*.pth exists, and then processing their contents.
I don't think there's a need to have a special method for executing a package __init__, since what you'd do in the case where there are .pth but no __init__, is to simply continue the search to the end of sys.path (or the parent package __path__), and *then* create the module with an appropriate __path__.
If at any point the find_module() call succeeds, then subsequent importers will just be asked for .pth files, which can then be processed into the __path__ of the now-loaded module.
IOW, something like this (very rough draft):
pth_contents = [] module = None
for pathitem in syspath_or_parent__path__:
importer = pkgutil.get_importer(pathitem) if importer is None: continue
if module is None: try: loader = importer.find_module(fullname) except ImportError: pass else: # errors here should propagate module = loader.load_module(fullname) if not hasattr(module, '__path__'): # found, but not a package return module
pc = get_pth_contents(importer) if pc is not None: subpath = os.path.join(pathitem, modulebasename) pth_contents.append(subpath) pth_contents.extend(pc) if '*' not in pth_contents: # got a package, but not a namespace break
if pth_contents: if module is None: # No __init__, but we have paths, so make an empty package module = # new module object w/empty __path__ modify__path__(module, pth_contents)
return module
Is it wise to modify __path__ post-import? Today people can make sure that __path__ is set to what they want before potentially reading it in their __init__ module by making the pkgutil.extend_path() call first. This would actually defer to after the import and thus not allow any __init__ code to rely on what __path__ eventually becomes.
Obviously, the details are all in the 'get_pth_contents()', and 'modify__path__()' functions, and the above process would do extra work in the case where an individual importer implements PEP 382 on its own (although why would it?).
It's also the case that this algorithm will be slow to fail imports when implemented as a meta_path hook, since it will be doing an extra pass over sys.path or the parent __path__, in addition to the one that's done by the normal __import__ machinery. (Though that's not an issue for Python 3.x, since this can be built into the core __import__).
(Technically, the 3.x version should probably ask meta_path hooks for their .pth files as well, but I'm not entirely sure that that's a meaningful thing to ask.)
The PEP 302 questions all boil down to how get_pth_contents() is implemented, and whether 'subpath' really should be created with os.path.join. Simply adding a get_pth_contents() method to the importer protocol (that returns None or a list of lines), and maybe a get_subpath(modulename) method that returns the path string that should be used for a subdirectory importer (i.e. __path__ entry), or None if no such subpath exists.
Code already out there uses os.path.join() to extend __path__ (e.g. Django), so I would stick with that unless we want to start transitioning to '/' only.
Looking at that proposal, I don't follow how changing *loaders* (vs. importers) would help. If an importer's find_module doesn't natively support PEP 382, then there's no way to get a loader for the package in the first place.
True. However, this requires no changes to the API, AFAICT. The *finder* (there are no importers, right?) will need to accept a folder with just a pth file as a package, but then still return a loader. The issue is then how to make the loader scan the folder for .pth files. One option would be to ask it "give me the contents of all the pth files", the other would be to have a method "load all the pth files".
In the PEP 302 scheme, then, it's either importers that have to change, or the process that invokes them. Being able to ask an importer the equivalents of os.path.join, listdir, and get_data would suffice to make an import process that could do the trick.
That would also work, but it would make a fairly wide interface. IIRC, MAL complained that doing so would break importers which can't do listdir.
Essentially, you'd ask each importer to first attempt to find the module, and then asking it (or the loader, if the find worked) whether packagename/*.pth exists, and then processing their contents.
The latter is my proposal, yes: ask the loader to process all pth files.
If at any point the find_module() call succeeds, then subsequent importers will just be asked for .pth files, which can then be processed into the __path__ of the now-loaded module.
IOW, something like this (very rough draft):
pth_contents = [] module = None
for pathitem in syspath_or_parent__path__:
importer = pkgutil.get_importer(pathitem) if importer is None: continue
if module is None: try: loader = importer.find_module(fullname) except ImportError: pass
Is this really how it works today? Shouldn't it abort here if there is an ImportError?
else: # errors here should propagate module = loader.load_module(fullname) if not hasattr(module, '__path__'): # found, but not a package return module
pc = get_pth_contents(importer)
Assuming we always get here with a loader, I'd rather call this on the loader. Regards, Martin
For finders, their search algorithm is changed in a couple of ways. One is that modules are given priority over packages (is that intentional, Martin, or just an oversight?).
That's an oversight. Notice, however, that it's really not the case that currently directories have precedence over modules, either: if a directory is later on the __path__ than a module, it's still the module that gets imported. So the precedence takes place only when a module and a directory exist in the same directory. In any case, I have now fixed it.
Two, the package search requires checking for a .pth file on top of an __init__.py. This will change finders that could before simply do an existence check on an __init__ "file"
You are reading something into the PEP that isn't there yet. PEP 302 currently isn't considered, and the question of this discussion is precisely how the loaders API should be changed.
(or whatever the storage back-end happened to be) and make it into a list-and-search which one would hope wasn't costly, but in same cases might be if the paths to files is not stored in a hierarchical fashion (e.g. zip files list entire files paths in their TOC or a sqlite3 DB which uses a path for keys will have to list **all** keys, sort them to just the relevant directory, and then look for .pth or some such approach).
First, I think it's up to the specific loader mechanism whether PEP 382 should be supported at all. It should be possible to implement it if desired, but if it's not feasible (e.g. for URL loaders), pth files just may not get considered. The loader may well provide a different mechanism to support namespace packages.
Are we worried about possible performance implications of this search?
For the specific case of zip files, I'm not. I don't think performance will suffer at all.
And then the search for the __init__.py begins on the newly modified __path__, which I assume ends with the first __init__ found on __path__, but if no file is found it's okay and essentially an empty module with just module-specific attributes is used?
Correct.
In other words, can a .pth file replace an __init__ file in delineating a package?
That's what it means by '''a directory is considered a package if it either contains a file named __init__.py, or a file whose name ends with ".pth".'''
Or is it purely additive? I assume the latter for compatibility reasons, but the PEP says "a directory is considered a package if it **either** contains a file named __init__.py, **or** a file whose name ends with ".pth"" (emphasis mine).
Why do you think this causes an incompatibility?
Otherwise I assume that the search will be done simply with ``os.path.isdir(os.path.join(sys_path_entry, top_level_package_name)`` and all existing paths will be added to __path__. Will they come before or after the directory where the *.pth was found? And will any subsequent *.pth files found in other directories also be executed?
I may misremember, but from reading the text, it seems to say "no". It should work like the current pth mechanism (plus *, minus import).
As for how "*" works, is this limited to top-level packages, or will sub-packages participate as well? I assume the former, but it is not directly stated in the PEP.
And indeed, the latter is intended. You should be able to create namespace packages on all levels.
If the latter, is a dotted package name changed to ``os.sep.join(sy_path_entry, package_name.replace('".", os.sep)``?
No. Instead, the parent package's __path__ is being searched for directories; sys.path is not considered anymore. I have fixed the text.
For sys.path_hooks, I am assuming import will simply skip over passing that as it is a marker that __path__ represents a namsepace package and not in any way functional. Although with sys.namespace_packages, is leaving the "*" in __path__ truly necessary?
It would help with efficiency, no?
For the search of paths to use to extend, are we limiting ourselves to actual file system entries on sys.path (as pkgutil does), or do we want to support other storage back-ends? To do the latter I would suggest having a successful path discovery be when a finder can be created for the hypothetical directory from sys.path_hooks.
Again: PEP 302 isn't really considered yet. Proposals are welcome.
The PEP (seems to) ask finders to look for a .pth file(s), calculate __path__, and then get a loader for the __init__. You could have finders grow a find_namespace method which returns the contents of the requisite .pth file(s).
I must be misunderstanding the concept of finders. Why is it that it would be their function to process the pth files, and not the function of the loader? Regards, Martin
On Mon, May 31, 2010 at 00:53, "Martin v. Löwis" <martin@v.loewis.de> wrote:
For finders, their search algorithm is changed in a couple of ways. One is that modules are given priority over packages (is that intentional, Martin, or just an oversight?).
That's an oversight. Notice, however, that it's really not the case that currently directories have precedence over modules, either: if a directory is later on the __path__ than a module, it's still the module that gets imported. So the precedence takes place only when a module and a directory exist in the same directory.
In any case, I have now fixed it.
Two, the package search requires checking for a .pth file on top of an __init__.py. This will change finders that could before simply do an existence check on an __init__ "file"
You are reading something into the PEP that isn't there yet. PEP 302 currently isn't considered, and the question of this discussion is precisely how the loaders API should be changed.
(or whatever the storage back-end happened to be) and make it into a list-and-search which one would hope wasn't costly, but in same cases might be if the paths to files is not stored in a hierarchical fashion (e.g. zip files list entire files paths in their TOC or a sqlite3 DB which uses a path for keys will have to list **all** keys, sort them to just the relevant directory, and then look for .pth or some such approach).
First, I think it's up to the specific loader mechanism whether PEP 382 should be supported at all. It should be possible to implement it if desired, but if it's not feasible (e.g. for URL loaders), pth files just may not get considered. The loader may well provide a different mechanism to support namespace packages.
Are we worried about possible
performance implications of this search?
For the specific case of zip files, I'm not. I don't think performance will suffer at all.
And then the search for the __init__.py begins on the newly modified __path__, which I assume ends with the first __init__ found on __path__, but if no file is found it's okay and essentially an empty module with just module-specific attributes is used?
Correct.
In other words, can a .pth file replace an __init__ file in delineating a package?
That's what it means by '''a directory is considered a package if it either contains a file named __init__.py, or a file whose name ends with ".pth".'''
Or is it purely additive? I assume the latter for compatibility reasons, but the PEP says "a directory is considered a package if it **either** contains a file named __init__.py, **or** a file whose name ends with ".pth"" (emphasis mine).
Why do you think this causes an incompatibility?
It's just if a project has no __init__ older finders won't process it, that's all. But it looks like they are going to have to change somewhat anyway so that's not an issue.
Otherwise I assume that the search will be done simply with ``os.path.isdir(os.path.join(sys_path_entry, top_level_package_name)`` and all existing paths will be added to __path__. Will they come before or after the directory where the *.pth was found? And will any subsequent *.pth files found in other directories also be executed?
I may misremember, but from reading the text, it seems to say "no". It should work like the current pth mechanism (plus *, minus import).
As for how "*" works, is this limited to top-level packages, or will sub-packages participate as well? I assume the former, but it is not directly stated in the PEP.
And indeed, the latter is intended. You should be able to create namespace packages on all levels.
If the latter, is a dotted package name changed to ``os.sep.join(sy_path_entry, package_name.replace('".", os.sep)``?
No. Instead, the parent package's __path__ is being searched for directories; sys.path is not considered anymore. I have fixed the text.
For sys.path_hooks, I am assuming import will simply skip over passing that as it is a marker that __path__ represents a namsepace package and not in any way functional. Although with sys.namespace_packages, is leaving the "*" in __path__ truly necessary?
It would help with efficiency, no?
Not sure how having "*" in __path__ helps with efficiency. What are you thinking it will help with specifically?
For the search of paths to use to extend, are we limiting ourselves to actual file system entries on sys.path (as pkgutil does), or do we want to support other storage back-ends? To do the latter I would suggest having a successful path discovery be when a finder can be created for the hypothetical directory from sys.path_hooks.
Again: PEP 302 isn't really considered yet. Proposals are welcome.
The PEP (seems to) ask finders to look for a .pth file(s), calculate __path__, and then get a loader for the __init__. You could have finders grow a find_namespace method which returns the contents of the requisite .pth file(s).
I must be misunderstanding the concept of finders. Why is it that it would be their function to process the pth files, and not the function of the loader?
I'm thinking from the perspective of finding an __init__ module that exists somewhere else than where the .pth file was discovered. Someone has to find the .pth files and provide their contents. Someone else needs to find the __init__ module for the package (if it exists). Then someone needs to load a namespace package, potentially from the __init__ module. It's that second step -- find the __init__ module -- that makes me think the finder is more involved. It doesn't have to be by definition, but seeing the word "find" just makes me think "finder".
participants (3)
-
"Martin v. Löwis"
-
Brett Cannon
-
P.J. Eby