PEP 420 - dynamic path computation is missing rationale
I have just reviewed PEP 420 (namespace packages) and sent Eric my detailed feedback; most of it is minor or requesting for examples and I'm sure he'll fix it to my satisfaction. Generally speaking the PEP is a beacon if clarity. But I stumbled about one feature that bothers me in its specification and through its lack of rationale. This is the section on Dynamic Path Computation: (http://www.python.org/dev/peps/pep-0420/#dynamic-path-computation). The specification bothers me because it requires in-place modification of sys.path. Does this mean sys.path is no longer a plain list? I'm sure it's going to break things left and right (or at least things will be violating this requirement left and right); there has never been a similar requirement (unlike, e.g., sys.modules, which is relatively well-known for being cached in a C-level global variable). Worse, this apparently affects __path__ variables of namespace packages as well, which are now specified as an unspecified read-only iterable. (I can only guess that there is a connection between these two features -- the PEP doesn't mention one.) Again, I would be much happier with just a list. While I can imagine there being a use case for recomputing the various paths, I am much less sure that it is worth attempting to specify that this will happen *automatically* when sys.path is modified in a certain way. I'd be much happier if these constraints were struck and the recomputation had to be requested explicitly by calling some new function in sys.
From my POV, this is the only show-stopper for acceptance of PEP 420. (That is, either a rock-solid rationale should be supplied, or the constraints should be removed.)
-- --Guido van Rossum (python.org/~guido)
On 5/20/2012 9:33 PM, Guido van Rossum wrote:
Generally speaking the PEP is a beacon if clarity. But I stumbled about one feature that bothers me in its specification and through its lack of rationale. This is the section on Dynamic Path Computation: (http://www.python.org/dev/peps/pep-0420/#dynamic-path-computation). The specification bothers me because it requires in-place modification of sys.path. Does this mean sys.path is no longer a plain list? I'm sure it's going to break things left and right (or at least things will be violating this requirement left and right); there has never been a similar requirement (unlike, e.g., sys.modules, which is relatively well-known for being cached in a C-level global variable). Worse, this apparently affects __path__ variables of namespace packages as well, which are now specified as an unspecified read-only iterable. (I can only guess that there is a connection between these two features -- the PEP doesn't mention one.) Again, I would be much happier with just a list.
sys.path would still be a plain list. It's the namespace package's __path__ that would be a special object. Every time __path__ is accessed it checks to see if it's parent path has been modified. The parent path for top level modules is sys.path. The __path__ object detects modification by keeping a local copy of the parent, plus a reference to the parent, and compares them.
While I can imagine there being a use case for recomputing the various paths, I am much less sure that it is worth attempting to specify that this will happen *automatically* when sys.path is modified in a certain way. I'd be much happier if these constraints were struck and the recomputation had to be requested explicitly by calling some new function in sys.
From my POV, this is the only show-stopper for acceptance of PEP 420. (That is, either a rock-solid rationale should be supplied, or the constraints should be removed.)
I don't have a preference on whether the feature stays or goes, so I'll let PJE give the use case. I've copied him here in case he doesn't read python-dev. Now that I think about it some more, the motivation is probably to ease the migration from setuptools, which does provide this feature. Eric.
On Mon, May 21, 2012 at 1:00 AM, Eric V. Smith <eric@trueblade.com> wrote:
On 5/20/2012 9:33 PM, Guido van Rossum wrote:
Generally speaking the PEP is a beacon if clarity. But I stumbled about one feature that bothers me in its specification and through its lack of rationale. This is the section on Dynamic Path Computation: (http://www.python.org/dev/peps/pep-0420/#dynamic-path-computation). The specification bothers me because it requires in-place modification of sys.path. Does this mean sys.path is no longer a plain list? I'm sure it's going to break things left and right (or at least things will be violating this requirement left and right); there has never been a similar requirement (unlike, e.g., sys.modules, which is relatively well-known for being cached in a C-level global variable). Worse, this apparently affects __path__ variables of namespace packages as well, which are now specified as an unspecified read-only iterable. (I can only guess that there is a connection between these two features -- the PEP doesn't mention one.) Again, I would be much happier with just a list.
sys.path would still be a plain list. It's the namespace package's __path__ that would be a special object. Every time __path__ is accessed it checks to see if it's parent path has been modified. The parent path for top level modules is sys.path. The __path__ object detects modification by keeping a local copy of the parent, plus a reference to the parent, and compares them.
Ah, I see. But I disagree that this is a reasonable constraint on sys.path. The magic __path__ object of a toplevel namespace module should know it is a toplevel module, and explicitly refetch sys.path rather than just keeping around a copy. This leaves the magic __path__ objects for namespace modules, which I could live with, as long as their repr was not the same as a list, and assuming a good rationale is given. Although I'd still prefer plain lists here as well; I'd like to be able to manually construct a namespace package and force its directories to be a specific set of directories that I happen to know about, regardless of whether they are related to sys.path or not. And I'd like to know that my setup in that case would not be disturbed by changes to sys.path.
While I can imagine there being a use case for recomputing the various paths, I am much less sure that it is worth attempting to specify that this will happen *automatically* when sys.path is modified in a certain way. I'd be much happier if these constraints were struck and the recomputation had to be requested explicitly by calling some new function in sys.
From my POV, this is the only show-stopper for acceptance of PEP 420. (That is, either a rock-solid rationale should be supplied, or the constraints should be removed.)
I don't have a preference on whether the feature stays or goes, so I'll let PJE give the use case. I've copied him here in case he doesn't read python-dev.
Now that I think about it some more, the motivation is probably to ease the migration from setuptools, which does provide this feature.
I'd like to hear more about this from Philip -- is that feature actually widely used? What would a package have to do if the feature didn't exist? I'd really much rather not have this feature, which reeks of too much magic to me. (An area where Philip and I often disagree. :-) -- --Guido van Rossum (python.org/~guido)
On Mon, May 21, 2012 at 9:55 AM, Guido van Rossum <guido@python.org> wrote:
Ah, I see. But I disagree that this is a reasonable constraint on sys.path. The magic __path__ object of a toplevel namespace module should know it is a toplevel module, and explicitly refetch sys.path rather than just keeping around a copy.
That's fine by me - the class could actually be defined to take a module name and attribute (e.g. 'sys', 'path' or 'foo', '__path__'), and then there'd be no need to special case anything: it would behave exactly the same way for subpackages and top-level packages.
This leaves the magic __path__ objects for namespace modules, which I could live with, as long as their repr was not the same as a list, and assuming a good rationale is given. Although I'd still prefer plain lists here as well; I'd like to be able to manually construct a namespace package and force its directories to be a specific set of directories that I happen to know about, regardless of whether they are related to sys.path or not. And I'd like to know that my setup in that case would not be disturbed by changes to sys.path.
To do that, you just assign to __path__, the same as now, ala __path__ = pkgutil.extend_path(). The auto-updating is in the initially-assigned __path__ object, not the module object or some sort of generalized magic. I'd like to hear more about this from Philip -- is that feature
actually widely used?
Well, it's built into setuptools, so yes. ;-) It gets used any time a dynamically specified dependency is used that might contain a namespace package. This means, for example, that every setup script out there using "setup.py test", every project using certain paste.deploy features... it's really difficult to spell out the scope of things that are using this, in the context of setuptools and distribute, because there are an immense number of ways to indirectly rely on it. This doesn't mean that the feature can't continue to be implemented inside setuptools' dynamic dependency system, but the code to do it in setuptools is MUCH more complicated than the PEP 420 code, and doesn't work if you manually add something to sys.path without asking setuptools to do it. It's also somewhat timing-sensitive, depending on when and whether you import 'site' and pkg_resources, and whether you are mixing eggs and non-eggs in your namespace packages. In short, the implementation is a huge mess that the PEP 420 approach would vastly simplify. But... that wasn't the original reason why I proposed it. The original reason was simply that it makes namespace packages act more like the equivalents do in other languages. While being able to override __path__ can be considered a feature of Python, its being static by default is NOT a feature, in the same way that *requiring* an __init__.py is not really a feature. The principle of least surprise says (at least IMO) that if you add a directory to sys.path, you should be able to import stuff from it. That whether it works depends on whether or not you already imported part of a namespace package earlier is both surprising and confusing. (More on this below.)
What would a package have to do if the feature didn't exist?
Continue to depend on setuptools to do it for them, or use some hypothetical update API... but that's not really the right question. ;-) The right question is, what happens to package *users* if the feature didn't exist? And the answer to that question is, "you must call this hypothetical update API *every time* you change sys.path, because otherwise your imports might break, depending on whether or not some other package imported something from a namespace before you changed sys.path". And of course, you also need to make sure that any third-party code you use does this too, if it adds something to sys.path for you. And if you're writing cross-Python-version code, you need to check to make sure whether the API is actually available. And if you're someone helping Python newbies, you need to add this to your list of debugging questions for import-related problems. And remember: if you forget to do this, it might not break now. It'll break later, when you add that other plugin or update that random module that dynamically decides to import something that just happens to be in a namespace package, so be prepared for it to break your application in the field, when an end-user is using it with a collection of plugins that you haven't tested together, or in the same import sequence... The people using setuptools won't have these problems, but *new* Python users will, as people begin using a PEP 420 that lacks this feature. The key scope question, I think, is: "How often do programs change sys.path at runtime, and what have they imported up to that point?" (Because for the other part of the scope, I think it's a fairly safe bet that namespace packages are going to become even *more* popular than they are now, once PEP 420 is in place.) But the key API/usability question is: "What's the One Obvious Way to add/change what's importable?" And I believe the answer to that question is, "change sys.path", not "change sys.path, and then import some other module to call another API to say, 'yes, I really *meant* to update sys.path, thank you very much.'" (Especially since NOT requiring that extra API isn't going to break any existing code.)
I'd really much rather not have this feature, which reeks of too much magic to me. (An area where Philip and I often disagree. :-)
My take on it is that it only SEEMS like magic, because we're used to static __path__. But other languages don't have per-package __path__ in the first place, so there's nothing to "automatically update", and so it's not magic at all that other subpackages/modules can be found when the system path changes! So, under the PEP 420 approach, it's *static* __path__ that's really the weird special case, and should be considered so. (After all, __path__ is and was primarily an implementation optimization and compatibility hack, rather than a user-facing "feature" of the import system.) For example, when *would* you want to explicitly spell out a namespace package __path__, and restrict it from seeing sys.path changes? I've not seen *anybody* ask for this feature in the context of setuptools; it's only ever been bug reports about when the more complicated implementation fails to detect an update. So, to wrap up: * The primary rationale for the feature is that "least surprise" for a new user to Python is that adding to sys.path should allow importing a portion of a namespace, whether or not you've already imported some other thing in that namespace. Symmetry with other languages and with other Python features (e.g. changing the working directory in an interactive interpreter) suggests it, and the removal of a similar timing dependency from PEP 402 (preventing direct import of a namespace-only package unless you imported a subpackage first) suggests that the same type of timing dependency should be removed here, too. (Note, for example, that I may not know that importing baz.spam indirectly causes some part of foo.wiz to be imported, and that if I then add another directory to sys.path containing a foo.* portion, my code will *no longer work* when I try to import foo.ham. This is much more "magical" behavior, in least-surprise terms!) * The constraints on sys.path and package __path__ objects can and should be removed, by making the dynamic path objects refer to a module and attribute, instead of directly referencing parent __path__ objects. Code that currently manipulates __path__ will not break, because such code will not be using PEP 420 namespace packages anyway (and so, __path__ will be a list. (Even so, the most common __path__ manipulation idiom is "__path__ = pkgutil.extend_path(...)" anyway!) * Namespace packages are a widely used feature of setuptools, and AFAIK nobody has *ever* asked to stop dynamic additions to namespace __path__, but a wide assortment of things people do with setuptools rely on dynamic additions under the hood. Providing the feature in PEP 420 gives a migration path away from setuptools, at least for this one feature. (Specifically, it does away with the need to use declare_namespace(), and the need to do all sys.path manipulation via setuptools' requirements API.) * Self-contained (__init__.py packages) and fixed __path__ lists can and should be considered the "magic" or "special case" parts of importing in Python 3, even though we're accustomed to them being central import concepts in Python 2. Modules and namespace packages can and should be the default case from an instructional POV, and sys.path updating should reflect this. (That is, future tutorials should introduce modules, then namespace packages, and finally self-contained packages with __init__ and __path__, because the *idea* of a namespace package doesn't depend on __path__ existing in the first place; it's essentially only a historical accident that self-contained packages were implemented in Python first.)
As a simple example to back up PJE's explanation, consider: 1. encodings becomes a namespace package 2. It sometimes gets imported during interpreter startup to initialise the standard io streams 3. An application modifies sys.path after startup and wants to contribute additional encodings Searching the entire parent path for new portions on every import would be needlessly slow. Not recognising new portions would be needlessly confusing for users. In our simple case above, the application would fail if the io initialisation accessed the encodings package, but work if it did not (e.g. when all streams are utf-8). PEP 420 splits the difference via an automatically invalidated cache: when you iterate over a namespace package __path__ object, it rescans the parent path for new portions *if and only if* the contents of the parent path have changed since the previous scan. Cheers, Nick. -- Sent from my phone, thus the relative brevity :) On May 22, 2012 4:10 AM, "PJ Eby" <pje@telecommunity.com> wrote:
On Mon, May 21, 2012 at 9:55 AM, Guido van Rossum <guido@python.org>wrote:
Ah, I see. But I disagree that this is a reasonable constraint on sys.path. The magic __path__ object of a toplevel namespace module should know it is a toplevel module, and explicitly refetch sys.path rather than just keeping around a copy.
That's fine by me - the class could actually be defined to take a module name and attribute (e.g. 'sys', 'path' or 'foo', '__path__'), and then there'd be no need to special case anything: it would behave exactly the same way for subpackages and top-level packages.
This leaves the magic __path__ objects for namespace modules, which I could live with, as long as their repr was not the same as a list, and assuming a good rationale is given. Although I'd still prefer plain lists here as well; I'd like to be able to manually construct a namespace package and force its directories to be a specific set of directories that I happen to know about, regardless of whether they are related to sys.path or not. And I'd like to know that my setup in that case would not be disturbed by changes to sys.path.
To do that, you just assign to __path__, the same as now, ala __path__ = pkgutil.extend_path(). The auto-updating is in the initially-assigned __path__ object, not the module object or some sort of generalized magic.
I'd like to hear more about this from Philip -- is that feature
actually widely used?
Well, it's built into setuptools, so yes. ;-) It gets used any time a dynamically specified dependency is used that might contain a namespace package. This means, for example, that every setup script out there using "setup.py test", every project using certain paste.deploy features... it's really difficult to spell out the scope of things that are using this, in the context of setuptools and distribute, because there are an immense number of ways to indirectly rely on it.
This doesn't mean that the feature can't continue to be implemented inside setuptools' dynamic dependency system, but the code to do it in setuptools is MUCH more complicated than the PEP 420 code, and doesn't work if you manually add something to sys.path without asking setuptools to do it. It's also somewhat timing-sensitive, depending on when and whether you import 'site' and pkg_resources, and whether you are mixing eggs and non-eggs in your namespace packages.
In short, the implementation is a huge mess that the PEP 420 approach would vastly simplify.
But... that wasn't the original reason why I proposed it. The original reason was simply that it makes namespace packages act more like the equivalents do in other languages. While being able to override __path__ can be considered a feature of Python, its being static by default is NOT a feature, in the same way that *requiring* an __init__.py is not really a feature.
The principle of least surprise says (at least IMO) that if you add a directory to sys.path, you should be able to import stuff from it. That whether it works depends on whether or not you already imported part of a namespace package earlier is both surprising and confusing. (More on this below.)
What would a package have to do if the feature didn't exist?
Continue to depend on setuptools to do it for them, or use some hypothetical update API... but that's not really the right question. ;-)
The right question is, what happens to package *users* if the feature didn't exist?
And the answer to that question is, "you must call this hypothetical update API *every time* you change sys.path, because otherwise your imports might break, depending on whether or not some other package imported something from a namespace before you changed sys.path".
And of course, you also need to make sure that any third-party code you use does this too, if it adds something to sys.path for you.
And if you're writing cross-Python-version code, you need to check to make sure whether the API is actually available.
And if you're someone helping Python newbies, you need to add this to your list of debugging questions for import-related problems.
And remember: if you forget to do this, it might not break now. It'll break later, when you add that other plugin or update that random module that dynamically decides to import something that just happens to be in a namespace package, so be prepared for it to break your application in the field, when an end-user is using it with a collection of plugins that you haven't tested together, or in the same import sequence...
The people using setuptools won't have these problems, but *new* Python users will, as people begin using a PEP 420 that lacks this feature.
The key scope question, I think, is: "How often do programs change sys.path at runtime, and what have they imported up to that point?" (Because for the other part of the scope, I think it's a fairly safe bet that namespace packages are going to become even *more* popular than they are now, once PEP 420 is in place.)
But the key API/usability question is: "What's the One Obvious Way to add/change what's importable?"
And I believe the answer to that question is, "change sys.path", not "change sys.path, and then import some other module to call another API to say, 'yes, I really *meant* to update sys.path, thank you very much.'"
(Especially since NOT requiring that extra API isn't going to break any existing code.)
I'd really much rather not have this feature, which reeks of too much magic to me. (An area where Philip and I often disagree. :-)
My take on it is that it only SEEMS like magic, because we're used to static __path__. But other languages don't have per-package __path__ in the first place, so there's nothing to "automatically update", and so it's not magic at all that other subpackages/modules can be found when the system path changes!
So, under the PEP 420 approach, it's *static* __path__ that's really the weird special case, and should be considered so. (After all, __path__ is and was primarily an implementation optimization and compatibility hack, rather than a user-facing "feature" of the import system.)
For example, when *would* you want to explicitly spell out a namespace package __path__, and restrict it from seeing sys.path changes? I've not seen *anybody* ask for this feature in the context of setuptools; it's only ever been bug reports about when the more complicated implementation fails to detect an update.
So, to wrap up:
* The primary rationale for the feature is that "least surprise" for a new user to Python is that adding to sys.path should allow importing a portion of a namespace, whether or not you've already imported some other thing in that namespace. Symmetry with other languages and with other Python features (e.g. changing the working directory in an interactive interpreter) suggests it, and the removal of a similar timing dependency from PEP 402 (preventing direct import of a namespace-only package unless you imported a subpackage first) suggests that the same type of timing dependency should be removed here, too. (Note, for example, that I may not know that importing baz.spam indirectly causes some part of foo.wiz to be imported, and that if I then add another directory to sys.path containing a foo.* portion, my code will *no longer work* when I try to import foo.ham. This is much more "magical" behavior, in least-surprise terms!)
* The constraints on sys.path and package __path__ objects can and should be removed, by making the dynamic path objects refer to a module and attribute, instead of directly referencing parent __path__ objects. Code that currently manipulates __path__ will not break, because such code will not be using PEP 420 namespace packages anyway (and so, __path__ will be a list. (Even so, the most common __path__ manipulation idiom is "__path__ = pkgutil.extend_path(...)" anyway!)
* Namespace packages are a widely used feature of setuptools, and AFAIK nobody has *ever* asked to stop dynamic additions to namespace __path__, but a wide assortment of things people do with setuptools rely on dynamic additions under the hood. Providing the feature in PEP 420 gives a migration path away from setuptools, at least for this one feature. (Specifically, it does away with the need to use declare_namespace(), and the need to do all sys.path manipulation via setuptools' requirements API.)
* Self-contained (__init__.py packages) and fixed __path__ lists can and should be considered the "magic" or "special case" parts of importing in Python 3, even though we're accustomed to them being central import concepts in Python 2. Modules and namespace packages can and should be the default case from an instructional POV, and sys.path updating should reflect this. (That is, future tutorials should introduce modules, then namespace packages, and finally self-contained packages with __init__ and __path__, because the *idea* of a namespace package doesn't depend on __path__ existing in the first place; it's essentially only a historical accident that self-contained packages were implemented in Python first.)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
I agree the parent path should be retrieved by name rather than a direct reference when checking the cache validity, though. -- Sent from my phone, thus the relative brevity :)
On 05/21/2012 07:25 PM, Nick Coghlan wrote:
As a simple example to back up PJE's explanation, consider: 1. encodings becomes a namespace package 2. It sometimes gets imported during interpreter startup to initialise the standard io streams 3. An application modifies sys.path after startup and wants to contribute additional encodings
Searching the entire parent path for new portions on every import would be needlessly slow.
Not recognising new portions would be needlessly confusing for users. In our simple case above, the application would fail if the io initialisation accessed the encodings package, but work if it did not (e.g. when all streams are utf-8).
PEP 420 splits the difference via an automatically invalidated cache: when you iterate over a namespace package __path__ object, it rescans the parent path for new portions *if and only if* the contents of the parent path have changed since the previous scan.
That seems like a pretty convincing example to me. Personally I'm +1 on putting dynamic computation into the PEP, at least for top-level namespace packages, and probably for all namespace packages. The code is not very large or complicated, and with the proposed removal of the restriction that sys.path cannot be replaced, I think it behaves well. But Guido can decide against it without hurting my feelings. Eric. P.S.: Here's the current code in the pep-420 branch. This code still has the restriction that sys.path (or parent_path in general) can't be replaced. I'll fix that if we decide to keep the feature. class _NamespacePath: def __init__(self, name, path, parent_path, path_finder): self._name = name self._path = path self._parent_path = parent_path self._last_parent_path = tuple(parent_path) self._path_finder = path_finder def _recalculate(self): # If _parent_path has changed, recalculate _path parent_path = tuple(self._parent_path) # Make a copy if parent_path != self._last_parent_path: loader, new_path = self._path_finder(self._name, parent_path) # Note that no changes are made if a loader is returned, but we # do remember the new parent path if loader is None: self._path = new_path self._last_parent_path = parent_path # Save the copy return self._path def __iter__(self): return iter(self._recalculate()) def __len__(self): return len(self._recalculate()) def __repr__(self): return "_NamespacePath" + repr((self._path, self._parent_path)) def __contains__(self, item): return item in self._recalculate()
On Wed, May 23, 2012 at 12:51 AM, Eric V. Smith <eric@trueblade.com> wrote:
That seems like a pretty convincing example to me.
Personally I'm +1 on putting dynamic computation into the PEP, at least for top-level namespace packages, and probably for all namespace packages.
Same here, but Guido's right that the rationale (and example) should be clearer in the PEP itself if the feature is to be retained.
P.S.: Here's the current code in the pep-420 branch. This code still has the restriction that sys.path (or parent_path in general) can't be replaced. I'll fix that if we decide to keep the feature.
I wonder if it would be worth exposing an importlib.LazyRef API to make it generally easy to avoid this kind of early binding problem? class LazyRef: # similar API to weakref.weakref def __init__(self, modname, attr=None): self.modname = modname self.attr = attr def __call__(self): mod = sys.modules[self.modname] attr = self.attr if attr is None: return mod return getattr(mod, attr) Then _NamespacePath could just be defined as taking a callable that returns the parent path: class _NamespacePath: def __init__(self, name, path, parent_path, path_finder): self._name = name self._path = path self._parent_path = parent_path self._last_parent_path = tuple(parent_path) self._path_finder = path_finder def _recalculate(self): # If _parent_path has changed, recalculate _path parent_path = tuple(self._parent_path()) # Retrieve and make a copy if parent_path != self._last_parent_path: loader, new_path = self._path_finder(self._name, parent_path) # Note that no changes are made if a loader is returned, but we # do remember the new parent path if loader is None: self._path = new_path self._last_parent_path = parent_path # Save the copy return self._path Even if the LazyRef idea isn't used, I still like the idea of passing a callable in to _NamespacePath for the parent path rather than hardcoding the "module name + attribute name" approach. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Wed, May 23, 2012 at 1:39 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
def _recalculate(self): # If _parent_path has changed, recalculate _path parent_path = tuple(self._parent_path()) # Retrieve and make a copy if parent_path != self._last_parent_path: loader, new_path = self._path_finder(self._name, parent_path) # Note that no changes are made if a loader is returned, but we # do remember the new parent path if loader is None: self._path = new_path self._last_parent_path = parent_path # Save the copy return self._path
Oops, I also meant to say that it's probably worth at least issuing ImportWarning if a new portion with an __init__.py gets added - it's going to block all future dynamic updates of that namespace package. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 05/22/2012 11:39 AM, Nick Coghlan wrote:
On Wed, May 23, 2012 at 12:51 AM, Eric V. Smith <eric@trueblade.com> wrote:
That seems like a pretty convincing example to me.
Personally I'm +1 on putting dynamic computation into the PEP, at least for top-level namespace packages, and probably for all namespace packages.
Same here, but Guido's right that the rationale (and example) should be clearer in the PEP itself if the feature is to be retained.
Completely agreed. I'll work on it.
Oops, I also meant to say that it's probably worth at least issuing ImportWarning if a new portion with an __init__.py gets added - it's going to block all future dynamic updates of that namespace package.
Right. That's on my list of things to clean up. It actually won't block updates during this run of Python, though: once a namespace package, always a namespace package. But if, on another run, that entry is on sys.path, then yes, it will block all namespace package portions. Eric.
On Tue, May 22, 2012 at 12:31 PM, Eric V. Smith <eric@trueblade.com> wrote:
On 05/22/2012 11:39 AM, Nick Coghlan wrote:
Oops, I also meant to say that it's probably worth at least issuing ImportWarning if a new portion with an __init__.py gets added - it's going to block all future dynamic updates of that namespace package.
Right. That's on my list of things to clean up. It actually won't block updates during this run of Python, though: once a namespace package, always a namespace package. But if, on another run, that entry is on sys.path, then yes, it will block all namespace package portions.
This discussion has gotten me thinking: should we expose a pkgutil.declare_namespace() API to allow such an __init__.py to turn itself back into a namespace? (Per our previous discussion on transitioning existing namespace packages.) It wouldn't need to do all the other stuff that the setuptools version does, it would just be a way to transition away from setuptools. What it would do is: 1. Recursively invoke itself for parent packages 2. Create the module object if it doesn't already exist 3. Set the module __path__ to a _NamespacePath instance. def declare_namespace(package_name): parent, dot, tail = package_name.rpartition('.') attr = '__path__' if dot: declare_namespace(parent) else: parent, attr = 'sys', 'path' with importlockcontext: module = sys.modules.get(package_name) if module is None: module = XXX new module here module.__path__ = _NamespacePath(...stuff involving 'parent' and 'attr') It may be that this should complain under certain circumstances, or use the '__path__ = something' idiom, but the above approach would be (basically) API compatible with the standard usage of declare_namespace. Obviously, this'll only be useful for people who are porting code going forward, but even if a different API is chosen, there still ought to be a way for people to do it. Namespace packages are one of a handful of features that are still basically setuptools-only at this point (i.e. not yet provided by packaging/distutils2), but if it's the only setuptools-only feature a project is using, they'd be able to drop their dependency as of 3.3. (Next up, I guess we'll need an entry-points PEP, but that'll be another discussion. ;-) )
Minor nit. On May 22, 2012, at 04:43 PM, PJ Eby wrote:
def declare_namespace(package_name): parent, dot, tail = package_name.rpartition('.') attr = '__path__' if dot: declare_namespace(parent) else: parent, attr = 'sys', 'path' with importlockcontext: module = sys.modules.get(package_name)
Best to use a marker object here instead of checking for None, since the latter is a valid value for an existing entry in sys.modules.
if module is None: module = XXX new module here module.__path__ = _NamespacePath(...stuff involving 'parent' and 'attr')
Cheers, -Barry
On 5/21/2012 2:08 PM, PJ Eby wrote:
On Mon, May 21, 2012 at 9:55 AM, Guido van Rossum <guido@python.org <mailto:guido@python.org>> wrote:
Ah, I see. But I disagree that this is a reasonable constraint on sys.path. The magic __path__ object of a toplevel namespace module should know it is a toplevel module, and explicitly refetch sys.path rather than just keeping around a copy.
That's fine by me - the class could actually be defined to take a module name and attribute (e.g. 'sys', 'path' or 'foo', '__path__'), and then there'd be no need to special case anything: it would behave exactly the same way for subpackages and top-level packages.
Any reason to make this the string "sys" or "foo", and not the module itself? Can the module be replaced in sys.modules? Mostly I'm just curious. But regardless, I'm okay with keeping these both as strings and looking it up in sys.modules and then by attribute. Eric.
On Mon, May 21, 2012 at 8:32 PM, Eric V. Smith <eric@trueblade.com> wrote:
Any reason to make this the string "sys" or "foo", and not the module itself? Can the module be replaced in sys.modules? Mostly I'm just curious.
Probably not, but it occurred to me that storing references to modules introduces a reference cycle that wasn't there when we were pointing to parent path objects instead. It basically would make child packages point to their parents, as well as the other way around.
Okay, I've been convinced that keeping the dynamic path feature is a good idea. I am really looking forward to seeing the rationale added to the PEP -- that's pretty much the last thing on my list that made me hesitate. I'll leave the details of exactly how the parent path is referenced up to the implementation team (several good points were made), as long as the restriction that sys.path must be modified in place is lifted. -- --Guido van Rossum (python.org/~guido)
On 5/22/2012 2:37 PM, Guido van Rossum wrote:
Okay, I've been convinced that keeping the dynamic path feature is a good idea. I am really looking forward to seeing the rationale added to the PEP -- that's pretty much the last thing on my list that made me hesitate. I'll leave the details of exactly how the parent path is referenced up to the implementation team (several good points were made), as long as the restriction that sys.path must be modified in place is lifted.
I've updated the PEP. Let me know how it looks. I have not updated the implementation yet. I'm not exactly sure how I'm going to convert from a path list of unknown origin to ('sys', 'path') or ('foo', '__path__'). I'll look at it later tonight to see if it's possible. I'm hoping it doesn't require major surgery to importlib._bootstrap. I still owe PEP updates for finder/loader examples and nested namespace package examples. But I think that's all that's needed. Eric.
On Tue, May 22, 2012 at 8:40 PM, Eric V. Smith <eric@trueblade.com> wrote:
On 5/22/2012 2:37 PM, Guido van Rossum wrote:
Okay, I've been convinced that keeping the dynamic path feature is a good idea. I am really looking forward to seeing the rationale added to the PEP -- that's pretty much the last thing on my list that made me hesitate. I'll leave the details of exactly how the parent path is referenced up to the implementation team (several good points were made), as long as the restriction that sys.path must be modified in place is lifted.
I've updated the PEP. Let me know how it looks.
My name is misspelled in it, but otherwise it looks fine. ;-) I have not updated the implementation yet. I'm not exactly sure how I'm
going to convert from a path list of unknown origin to ('sys', 'path') or ('foo', '__path__'). I'll look at it later tonight to see if it's possible. I'm hoping it doesn't require major surgery to importlib._bootstrap.
It shouldn't - all you should need is to use getattr(sys.modules[self.modname], self.attr) instead of referencing a parent path object directly. (The more interesting thing is what to do if the parent module goes away, due to somebody deleting the module out of sys.modules. The simplest thing to do would probably be to just keep using the cached value in that case.) Ah crap, I just thought of something - what happens if you reload() a namespace package? Probably nothing, but should we specify what sort of nothing? ;-)
On 05/22/2012 09:49 PM, PJ Eby wrote:
On Tue, May 22, 2012 at 8:40 PM, Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:
On 5/22/2012 2:37 PM, Guido van Rossum wrote: > Okay, I've been convinced that keeping the dynamic path feature is a > good idea. I am really looking forward to seeing the rationale added > to the PEP -- that's pretty much the last thing on my list that made > me hesitate. I'll leave the details of exactly how the parent path is > referenced up to the implementation team (several good points were > made), as long as the restriction that sys.path must be modified in > place is lifted.
I've updated the PEP. Let me know how it looks.
My name is misspelled in it, but otherwise it looks fine. ;-)
Oops, sorry. Fixed (I think).
I have not updated the implementation yet. I'm not exactly sure how I'm going to convert from a path list of unknown origin to ('sys', 'path') or ('foo', '__path__'). I'll look at it later tonight to see if it's possible. I'm hoping it doesn't require major surgery to importlib._bootstrap.
It shouldn't - all you should need is to use getattr(sys.modules[self.modname], self.attr) instead of referencing a parent path object directly.
The problem isn't the lookup, it's coming up with self.modname and self.attr. As it currently stands, PathFinder.find_module is given the parent path, not the module name and attribute name used to look up the parent path using sys.modules and getattr. Eric.
On Wed, May 23, 2012 at 10:31 PM, Eric V. Smith <eric@trueblade.com> wrote:
On 05/22/2012 09:49 PM, PJ Eby wrote:
It shouldn't - all you should need is to use getattr(sys.modules[self.modname], self.attr) instead of referencing a parent path object directly.
The problem isn't the lookup, it's coming up with self.modname and self.attr. As it currently stands, PathFinder.find_module is given the parent path, not the module name and attribute name used to look up the parent path using sys.modules and getattr.
Right, that's what PJE and I were discussing. Instead of passing in the path object directly, you can instead pass an object that *lazily* retrieves the path object in its __iter__ method: class LazyIterable: """On iteration, retrieves a reference to a named iterable and returns an iterator over that iterable""" def __init__(self, modname, attribute): self.modname = modname self.attribute = attribute def __iter__(self): mod = import_module(self.modname) # Will almost always get a hit directly in sys.modules return iter(getattr(mod, self.attribute) Where importlib currently passes None or sys.path as the path argument to find_module(), instead pass "LazyIterable('sys', 'path')" and where it currently passes package.__path__, instead pass "LazyIterable(package.__name__, '__path__')". The existing for loop iteration and tuple() calls should then take care of the lazy lookup automatically. That way, the only code that needs to know the values of modname and attribute is the code that already has access to those values. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 05/23/2012 09:02 AM, Nick Coghlan wrote:
On Wed, May 23, 2012 at 10:31 PM, Eric V. Smith <eric@trueblade.com> wrote:
On 05/22/2012 09:49 PM, PJ Eby wrote:
It shouldn't - all you should need is to use getattr(sys.modules[self.modname], self.attr) instead of referencing a parent path object directly.
The problem isn't the lookup, it's coming up with self.modname and self.attr. As it currently stands, PathFinder.find_module is given the parent path, not the module name and attribute name used to look up the parent path using sys.modules and getattr.
Right, that's what PJE and I were discussing. Instead of passing in the path object directly, you can instead pass an object that *lazily* retrieves the path object in its __iter__ method:
Hey, one message at a time! I'm just reading those now. I'd like to hear Brett's comments on this approach. Eric.
On Wed, May 23, 2012 at 9:10 AM, Eric V. Smith <eric@trueblade.com> wrote:
On 05/23/2012 09:02 AM, Nick Coghlan wrote:
On Wed, May 23, 2012 at 10:31 PM, Eric V. Smith <eric@trueblade.com> wrote:
On 05/22/2012 09:49 PM, PJ Eby wrote:
It shouldn't - all you should need is to use getattr(sys.modules[self.modname], self.attr) instead of referencing a parent path object directly.
The problem isn't the lookup, it's coming up with self.modname and self.attr. As it currently stands, PathFinder.find_module is given the parent path, not the module name and attribute name used to look up the parent path using sys.modules and getattr.
Right, that's what PJE and I were discussing. Instead of passing in the path object directly, you can instead pass an object that *lazily* retrieves the path object in its __iter__ method:
Hey, one message at a time! I'm just reading those now.
I'd like to hear Brett's comments on this approach.
If I understand the proposal correctly, this would be a change in NamespaceLoader in how it sets __path__ and in no way affect any other code since __import__() just grabs the object on __path__ and passes as an argument to the meta path finders which just iterate over the object, so I have no objections to it. -Brett
On Wed, May 23, 2012 at 3:02 PM, Brett Cannon <brett@python.org> wrote:
If I understand the proposal correctly, this would be a change in NamespaceLoader in how it sets __path__ and in no way affect any other code since __import__() just grabs the object on __path__ and passes as an argument to the meta path finders which just iterate over the object, so I have no objections to it.
That's not *quite* the proposal (but almost). The change would also mean that __import__() instead passes a ModulePath (aka Nick's LazyIterable) instance to the meta path finders, which just iterate over it. But other than that, yes.
On Wed, May 23, 2012 at 3:35 PM, PJ Eby <pje@telecommunity.com> wrote:
On Wed, May 23, 2012 at 3:02 PM, Brett Cannon <brett@python.org> wrote:
If I understand the proposal correctly, this would be a change in NamespaceLoader in how it sets __path__ and in no way affect any other code since __import__() just grabs the object on __path__ and passes as an argument to the meta path finders which just iterate over the object, so I have no objections to it.
That's not *quite* the proposal (but almost). The change would also mean that __import__() instead passes a ModulePath (aka Nick's LazyIterable) instance to the meta path finders, which just iterate over it. But other than that, yes.
And why does __import__() need to construct that? I thought NamespaceLoader was going to be making these "magical" __path__ objects that detected changes and thus update themselves as necessary and just stick them on the object. Why specifically does __import__() need to play a role?
On 05/23/2012 03:56 PM, Brett Cannon wrote:
On Wed, May 23, 2012 at 3:35 PM, PJ Eby <pje@telecommunity.com <mailto:pje@telecommunity.com>> wrote:
On Wed, May 23, 2012 at 3:02 PM, Brett Cannon <brett@python.org <mailto:brett@python.org>> wrote:
If I understand the proposal correctly, this would be a change in NamespaceLoader in how it sets __path__ and in no way affect any other code since __import__() just grabs the object on __path__ and passes as an argument to the meta path finders which just iterate over the object, so I have no objections to it.
That's not *quite* the proposal (but almost). The change would also mean that __import__() instead passes a ModulePath (aka Nick's LazyIterable) instance to the meta path finders, which just iterate over it. But other than that, yes.
And why does __import__() need to construct that? I thought NamespaceLoader was going to be making these "magical" __path__ objects that detected changes and thus update themselves as necessary and just stick them on the object. Why specifically does __import__() need to play a role?
Assume that we're talking about importing either a top-level namespace package named 'parent' and a nested namespace package parent.child. The problem is that NamespaceLoader is just passed the parent path (typically sys.path, but if a sub-package then parent.__path__). The concern is that if the parent path object is replaced: sys.path = sys.path + ['new-dir'] or parent.__path__ = ['new-dir'] then the NamespaceLoader instance can no longer detect changes to parent_path. So the proposed solution is for NamespaceLoader to be told the name of the parent module ('sys' or 'parent') and the attribute name to use to find the path ('path' or '__path__'). Here's another suggestion: instead of modifying the finder/loader code to pass these names through, assume that we can always find (module_name, attribute_name) with this code: def find_parent_path_names(module): parent, dot, me = module.__name__.rpartition('.') if dot == '': return 'sys', 'path' return parent, '__path__'
import glob find_parent_path_names(glob) ('sys', 'path') import unittest.test.test_case find_parent_path_names(unittest.test.test_case) ('unittest.test', '__path__')
I guess it's a little more fragile than passing in these names to NamespaceLoader, but it requires less code to change. I think I'll whip this up in the pep-420 branch. Eric.
Here's another suggestion: instead of modifying the finder/loader code to pass these names through, assume that we can always find (module_name, attribute_name) with this code:
def find_parent_path_names(module): parent, dot, me = module.__name__.rpartition('.') if dot == '': return 'sys', 'path' return parent, '__path__'
import glob find_parent_path_names(glob) ('sys', 'path') import unittest.test.test_case find_parent_path_names(unittest.test.test_case) ('unittest.test', '__path__')
I guess it's a little more fragile than passing in these names to NamespaceLoader, but it requires less code to change.
I think I'll whip this up in the pep-420 branch.
I tried this approach and it works fine. The only caveat is that it assumes that the parent path can always be computed as described above, independent of what's passed in to PathFinder.load_module(). I think that's reasonable, since load_module() itself hard-codes sys.path if the supplied path is missing. I've checked this in to the pep-420 branch. I prefer this approach over Nick's because it doesn't require any changes to any existing interfaces. The changes are contained to the namespace package code and don't affect other code in importlib. Assuming this approach is acceptable, I'm done with the PEP except for adding some examples. And I'm done with the implementation except for adding tests and a few small tweaks. Eric.
On Wed, May 23, 2012 at 8:24 PM, Eric V. Smith <eric@trueblade.com> wrote:
I tried this approach and it works fine. The only caveat is that it assumes that the parent path can always be computed as described above, independent of what's passed in to PathFinder.load_module(). I think that's reasonable, since load_module() itself hard-codes sys.path if the supplied path is missing.
Technically, PEP 302 says that finders aren't allowed to assume their parent packages are imported: """ However, the find_module() method isn't necessarily always called during an actual import: meta tools that analyze import dependencies (such as freeze, Installer or py2exe) don't actually load modules, so a finder shouldn't *depend* on the parent package being available in sys.modules.""" OTOH, that's finders, and I think we're dealing with loaders here. Splitting hairs, perhaps, but at least it's in a good cause. ;-) I've checked this in to the pep-420 branch. I prefer this approach over
Nick's because it doesn't require any changes to any existing interfaces. The changes are contained to the namespace package code and don't affect other code in importlib.
Assuming this approach is acceptable, I'm done with the PEP except for adding some examples.
And I'm done with the implementation except for adding tests and a few small tweaks.
Yay!
On 5/23/2012 8:58 PM, PJ Eby wrote:
On Wed, May 23, 2012 at 8:24 PM, Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:
I tried this approach and it works fine. The only caveat is that it assumes that the parent path can always be computed as described above, independent of what's passed in to PathFinder.load_module(). I think that's reasonable, since load_module() itself hard-codes sys.path if the supplied path is missing.
Technically, PEP 302 says that finders aren't allowed to assume their parent packages are imported:
""" However, the find_module() method isn't necessarily always called during an actual import: meta tools that analyze import dependencies (such as freeze, Installer or py2exe) don't actually load modules, so a finder shouldn't /depend/ on the parent package being available in sys.modules."""
OTOH, that's finders, and I think we're dealing with loaders here. Splitting hairs, perhaps, but at least it's in a good cause. ;-)
I guess I could store the passed-in parent path, and use that if it can't be found through sys.modules. I'm not sure I can conjure up code to test this.
On Wed, May 23, 2012 at 9:02 PM, Eric V. Smith <eric@trueblade.com> wrote:
On Wed, May 23, 2012 at 8:24 PM, Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:
I tried this approach and it works fine. The only caveat is that it assumes that the parent path can always be computed as described above, independent of what's passed in to PathFinder.load_module(). I think that's reasonable, since load_module() itself hard-codes sys.path if
On 5/23/2012 8:58 PM, PJ Eby wrote: the
supplied path is missing.
Technically, PEP 302 says that finders aren't allowed to assume their parent packages are imported:
""" However, the find_module() method isn't necessarily always called during an actual import: meta tools that analyze import dependencies (such as freeze, Installer or py2exe) don't actually load modules, so a finder shouldn't /depend/ on the parent package being available in sys.modules."""
OTOH, that's finders, and I think we're dealing with loaders here. Splitting hairs, perhaps, but at least it's in a good cause. ;-)
I guess I could store the passed-in parent path, and use that if it can't be found through sys.modules.
I'm not sure I can conjure up code to test this.
I actually was suggesting that we change PEP 302, if it became an issue. ;-)
On Thu, May 24, 2012 at 11:02 AM, Eric V. Smith <eric@trueblade.com> wrote:
On 5/23/2012 8:58 PM, PJ Eby wrote:
OTOH, that's finders, and I think we're dealing with loaders here. Splitting hairs, perhaps, but at least it's in a good cause. ;-)
I guess I could store the passed-in parent path, and use that if it can't be found through sys.modules.
I'm not sure I can conjure up code to test this.
I don't think there's a need to change anything from your current strategy, but we should be clear in the docs: 1. Finders should *not* assume their parent packages have been loaded (and should not load them implicitly) 2. Loaders *can* assume their parent packages have already been loaded and are present in sys.modules (and can complain if they're not there) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
I've reviewed the updates to the PEP and have accepted it. Congrats all! I know the implementation is lagging behind a bit, that's not a problem. Just get it into the next 3.3 alpha release! -- --Guido van Rossum (python.org/~guido)
On 5/24/2012 2:33 PM, Guido van Rossum wrote:
I've reviewed the updates to the PEP and have accepted it. Congrats all!
Thanks to the many people who helped: Martin, Barry, Guido, Jason, Nick, PJE, and others. I'm sure I've offended someone by leaving them out, and I apologize in advance. But special thanks to Brett. Without his work on importlib, this never would have happened (as Barry, Jason, and I demonstrated on a two or three occasions)!
I know the implementation is lagging behind a bit, that's not a problem. Just get it into the next 3.3 alpha release!
It's only missing a few small things. I'll get it committed in the next day or so. Eric.
On May 23, 2012 9:02 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
On Wed, May 23, 2012 at 10:31 PM, Eric V. Smith <eric@trueblade.com>
wrote:
On 05/22/2012 09:49 PM, PJ Eby wrote:
It shouldn't - all you should need is to use getattr(sys.modules[self.modname], self.attr) instead of referencing a parent path object directly.
The problem isn't the lookup, it's coming up with self.modname and self.attr. As it currently stands, PathFinder.find_module is given the parent path, not the module name and attribute name used to look up the parent path using sys.modules and getattr.
Right, that's what PJE and I were discussing. Instead of passing in the path object directly, you can instead pass an object that *lazily* retrieves the path object in its __iter__ method:
class LazyIterable: """On iteration, retrieves a reference to a named iterable and returns an iterator over that iterable""" def __init__(self, modname, attribute): self.modname = modname self.attribute = attribute def __iter__(self): mod = import_module(self.modname) # Will almost always get a hit directly in sys.modules return iter(getattr(mod, self.attribute)
Where importlib currently passes None or sys.path as the path argument to find_module(), instead pass "LazyIterable('sys', 'path')" and where it currently passes package.__path__, instead pass "LazyIterable(package.__name__, '__path__')".
The existing for loop iteration and tuple() calls should then take care of the lazy lookup automatically.
That way, the only code that needs to know the values of modname and attribute is the code that already has access to those values.
Perhaps calling it a ModulePath instead of a LazyIterable would be better? Also, this is technically a change from PEP 302, which says the actual sys.path or __path__ are passed to find_module(). I'm not sure whether any find_module() code ever written actually *cares* about this, though. (Especially if, as I believe I understand in this context, we're only talking about meta-importers.)
On Wed, May 23, 2012 at 10:40 AM, Eric V. Smith <eric@trueblade.com> wrote:
On 5/22/2012 2:37 PM, Guido van Rossum wrote:
Okay, I've been convinced that keeping the dynamic path feature is a good idea. I am really looking forward to seeing the rationale added to the PEP -- that's pretty much the last thing on my list that made me hesitate. I'll leave the details of exactly how the parent path is referenced up to the implementation team (several good points were made), as long as the restriction that sys.path must be modified in place is lifted.
I've updated the PEP. Let me know how it looks.
I have not updated the implementation yet. I'm not exactly sure how I'm going to convert from a path list of unknown origin to ('sys', 'path') or ('foo', '__path__'). I'll look at it later tonight to see if it's possible. I'm hoping it doesn't require major surgery to importlib._bootstrap.
If you wanted to do this without changing the sys.meta_path hook API, you'd have to pass an object to find_module() that did the dynamic lookup of the value in obj.__iter__. Something like: class _LazyPath: def __init__(self, modname, attribute): self.modname = modname self.attribute = attribute def __iter__(self): return iter(getattr(sys.module[self.modname], self.attribute)) A potentially cleaner alternative to consider is tweaking the find_loader API spec so that it gets used at the meta path level as well as at the path hooks level and is handed a *callable* that dynamically retrieves the path rather than a direct reference to the path itself. The full signature of find_loader would then become: def find_loader(fullname, get_path=None): # fullname as for find_module # When get_path is None, it means the finder is being called as a path hook and # should use the specific path entry passed to __init__ # In this case, namespace package portions are returned as (None, portions) # Otherwise, the finder is being called as a meta_path hook and get_path() will return the relevant path # Any namespace packages are then returned as (loader, portions) There are two major consequences of this latter approach: - the PEP 302 find_module API would now be a purely legacy interface for both the meta_path and path_hooks, used only if find_loader is not defined - it becomes trivial to tell whether a particular name references a package or not *without* needing to load it first: find_loader() returns a non-empty iterable for the list of portions That second consequence is rather appealing: it means you'd be able to implement an almost complete walk of a package hierarchy *without* having to import anything (although you would miss old-style namespace packages and any other packages that alter their own __path__ in __init__, so you may still want to load packages to make sure you found everything. You could definitively answer the "is this a package or not?" question without running any code, though). The first consequence is also appealing, since the find_module() name is more than a little misleading. The "find_module" name strongly suggests that the method is expected to return a module object, and that's just wrong - you actually find a loader, then you use that to load the module. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, May 22, 2012 at 9:58 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
If you wanted to do this without changing the sys.meta_path hook API, you'd have to pass an object to find_module() that did the dynamic lookup of the value in obj.__iter__. Something like:
class _LazyPath: def __init__(self, modname, attribute): self.modname = modname self.attribute = attribute def __iter__(self): return iter(getattr(sys.module[self.modname], self.attribute))
A potentially cleaner alternative to consider is tweaking the find_loader API spec so that it gets used at the meta path level as well as at the path hooks level and is handed a *callable* that dynamically retrieves the path rather than a direct reference to the path itself.
The full signature of find_loader would then become:
def find_loader(fullname, get_path=None): # fullname as for find_module # When get_path is None, it means the finder is being called as a path hook and # should use the specific path entry passed to __init__ # In this case, namespace package portions are returned as (None, portions) # Otherwise, the finder is being called as a meta_path hook and get_path() will return the relevant path # Any namespace packages are then returned as (loader, portions)
There are two major consequences of this latter approach: - the PEP 302 find_module API would now be a purely legacy interface for both the meta_path and path_hooks, used only if find_loader is not defined - it becomes trivial to tell whether a particular name references a package or not *without* needing to load it first: find_loader() returns a non-empty iterable for the list of portions
That second consequence is rather appealing: it means you'd be able to implement an almost complete walk of a package hierarchy *without* having to import anything (although you would miss old-style namespace packages and any other packages that alter their own __path__ in __init__, so you may still want to load packages to make sure you found everything. You could definitively answer the "is this a package or not?" question without running any code, though).
The first consequence is also appealing, since the find_module() name is more than a little misleading. The "find_module" name strongly suggests that the method is expected to return a module object, and that's just wrong - you actually find a loader, then you use that to load the module.
While I see no problem with cleaning up the interface, I'm kind of lost as to the point of making a get_path callable, vs. just using the iterable interface you sketched. Python has iterables, so why add a call to get the iterable, when iter() or a straight "for" loop will do effectively the same thing?
On Wed, May 23, 2012 at 1:58 PM, PJ Eby <pje@telecommunity.com> wrote:
While I see no problem with cleaning up the interface, I'm kind of lost as to the point of making a get_path callable, vs. just using the iterable interface you sketched. Python has iterables, so why add a call to get the iterable, when iter() or a straight "for" loop will do effectively the same thing?
Yeah, I'm not sure what I was thinking either, since just documenting the interface and providing LazyPath as a public API somewhere in importlib should suffice. Meta path hooks are already going to need to tolerate being handed arbitrary iterables, since that's exactly what namespace package path objects are going to be. While I still like the idea of killing off find_module() completely rather than leaving it in at the meta_path level, there's no reason that needs to be done as part of PEP 420 itself. Instead, it can be done later if anyone comes up with a concrete use case for access the path details without loading packages and modules. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (7)
-
Barry Warsaw
-
Brett Cannon
-
Eric Snow
-
Eric V. Smith
-
Guido van Rossum
-
Nick Coghlan
-
PJ Eby