As a simple example to back up PJE's explanation, consider:
1. encodings becomes a namespace package
2. It sometimes gets imported during interpreter startup to initialise the standard io streams
3. An application modifies sys.path after startup and wants to contribute additional encodings
Searching the entire parent path for new portions on every import would be needlessly slow.
Not recognising new portions would be needlessly confusing for users. In our simple case above, the application would fail if the io initialisation accessed the encodings package, but work if it did not (e.g. when all streams are utf-8).
PEP 420 splits the difference via an automatically invalidated cache: when you iterate over a namespace package __path__ object, it rescans the parent path for new portions *if and only if* the contents of the parent path have changed since the previous scan.
Sent from my phone, thus the relative brevity :)
On Mon, May 21, 2012 at 9:55 AM, Guido van Rossum <email@example.com> wrote:Ah, I see. But I disagree that this is a reasonable constraint onsys.path. The magic __path__ object of a toplevel namespace module
should know it is a toplevel module, and explicitly refetch sys.path
rather than just keeping around a copy.
That's fine by me - the class could actually be defined to take a module name and attribute (e.g. 'sys', 'path' or 'foo', '__path__'), and then there'd be no need to special case anything: it would behave exactly the same way for subpackages and top-level packages.
This leaves the magic __path__ objects for namespace modules, which I
could live with, as long as their repr was not the same as a list, and
assuming a good rationale is given. Although I'd still prefer plain
lists here as well; I'd like to be able to manually construct a
namespace package and force its directories to be a specific set of
directories that I happen to know about, regardless of whether they
are related to sys.path or not. And I'd like to know that my setup in
that case would not be disturbed by changes to sys.path.
To do that, you just assign to __path__, the same as now, ala __path__ = pkgutil.extend_path(). The auto-updating is in the initially-assigned __path__ object, not the module object or some sort of generalized magic.
I'd like to hear more about this from Philip -- is that feature
actually widely used?
Well, it's built into setuptools, so yes. ;-) It gets used any time a dynamically specified dependency is used that might contain a namespace package. This means, for example, that every setup script out there using "setup.py test", every project using certain paste.deploy features... it's really difficult to spell out the scope of things that are using this, in the context of setuptools and distribute, because there are an immense number of ways to indirectly rely on it.
This doesn't mean that the feature can't continue to be implemented inside setuptools' dynamic dependency system, but the code to do it in setuptools is MUCH more complicated than the PEP 420 code, and doesn't work if you manually add something to sys.path without asking setuptools to do it. It's also somewhat timing-sensitive, depending on when and whether you import 'site' and pkg_resources, and whether you are mixing eggs and non-eggs in your namespace packages.
In short, the implementation is a huge mess that the PEP 420 approach would vastly simplify.
But... that wasn't the original reason why I proposed it. The original reason was simply that it makes namespace packages act more like the equivalents do in other languages. While being able to override __path__ can be considered a feature of Python, its being static by default is NOT a feature, in the same way that *requiring* an __init__.py is not really a feature.
The principle of least surprise says (at least IMO) that if you add a directory to sys.path, you should be able to import stuff from it. That whether it works depends on whether or not you already imported part of a namespace package earlier is both surprising and confusing. (More on this below.)
What would a package have to do if the feature didn't exist?
Continue to depend on setuptools to do it for them, or use some hypothetical update API... but that's not really the right question. ;-)
The right question is, what happens to package *users* if the feature didn't exist?
And the answer to that question is, "you must call this hypothetical update API *every time* you change sys.path, because otherwise your imports might break, depending on whether or not some other package imported something from a namespace before you changed sys.path".
And of course, you also need to make sure that any third-party code you use does this too, if it adds something to sys.path for you.
And if you're writing cross-Python-version code, you need to check to make sure whether the API is actually available.
And if you're someone helping Python newbies, you need to add this to your list of debugging questions for import-related problems.
And remember: if you forget to do this, it might not break now. It'll break later, when you add that other plugin or update that random module that dynamically decides to import something that just happens to be in a namespace package, so be prepared for it to break your application in the field, when an end-user is using it with a collection of plugins that you haven't tested together, or in the same import sequence...
The people using setuptools won't have these problems, but *new* Python users will, as people begin using a PEP 420 that lacks this feature.
The key scope question, I think, is: "How often do programs change sys.path at runtime, and what have they imported up to that point?" (Because for the other part of the scope, I think it's a fairly safe bet that namespace packages are going to become even *more* popular than they are now, once PEP 420 is in place.)
But the key API/usability question is: "What's the One Obvious Way to add/change what's importable?"
And I believe the answer to that question is, "change sys.path", not "change sys.path, and then import some other module to call another API to say, 'yes, I really *meant* to update sys.path, thank you very much.'"
(Especially since NOT requiring that extra API isn't going to break any existing code.)
I'd really much rather not have this feature, which
reeks of too much magic to me. (An area where Philip and I often
My take on it is that it only SEEMS like magic, because we're used to static __path__. But other languages don't have per-package __path__ in the first place, so there's nothing to "automatically update", and so it's not magic at all that other subpackages/modules can be found when the system path changes!
So, under the PEP 420 approach, it's *static* __path__ that's really the weird special case, and should be considered so. (After all, __path__ is and was primarily an implementation optimization and compatibility hack, rather than a user-facing "feature" of the import system.)
For example, when *would* you want to explicitly spell out a namespace package __path__, and restrict it from seeing sys.path changes? I've not seen *anybody* ask for this feature in the context of setuptools; it's only ever been bug reports about when the more complicated implementation fails to detect an update.
So, to wrap up:
* The primary rationale for the feature is that "least surprise" for a new user to Python is that adding to sys.path should allow importing a portion of a namespace, whether or not you've already imported some other thing in that namespace. Symmetry with other languages and with other Python features (e.g. changing the working directory in an interactive interpreter) suggests it, and the removal of a similar timing dependency from PEP 402 (preventing direct import of a namespace-only package unless you imported a subpackage first) suggests that the same type of timing dependency should be removed here, too. (Note, for example, that I may not know that importing baz.spam indirectly causes some part of foo.wiz to be imported, and that if I then add another directory to sys.path containing a foo.* portion, my code will *no longer work* when I try to import foo.ham. This is much more "magical" behavior, in least-surprise terms!)
* The constraints on sys.path and package __path__ objects can and should be removed, by making the dynamic path objects refer to a module and attribute, instead of directly referencing parent __path__ objects. Code that currently manipulates __path__ will not break, because such code will not be using PEP 420 namespace packages anyway (and so, __path__ will be a list. (Even so, the most common __path__ manipulation idiom is "__path__ = pkgutil.extend_path(...)" anyway!)
* Namespace packages are a widely used feature of setuptools, and AFAIK nobody has *ever* asked to stop dynamic additions to namespace __path__, but a wide assortment of things people do with setuptools rely on dynamic additions under the hood. Providing the feature in PEP 420 gives a migration path away from setuptools, at least for this one feature. (Specifically, it does away with the need to use declare_namespace(), and the need to do all sys.path manipulation via setuptools' requirements API.)
* Self-contained (__init__.py packages) and fixed __path__ lists can and should be considered the "magic" or "special case" parts of importing in Python 3, even though we're accustomed to them being central import concepts in Python 2. Modules and namespace packages can and should be the default case from an instructional POV, and sys.path updating should reflect this. (That is, future tutorials should introduce modules, then namespace packages, and finally self-contained packages with __init__ and __path__, because the *idea* of a namespace package doesn't depend on __path__ existing in the first place; it's essentially only a historical accident that self-contained packages were implemented in Python first.)
Python-Dev mailing list