[Python-Dev] PEP 420 - dynamic path computation is missing rationale

Mon May 21 20:08:16 CEST 2012

On Mon, May 21, 2012 at 9:55 AM, Guido van Rossum <guido at python.org> wrote:

> Ah, I see. But I disagree that this is a reasonable constraint on
> sys.path. The magic __path__ object of a toplevel namespace module
> should know it is a toplevel module, and explicitly refetch sys.path
> rather than just keeping around a copy.
>

That's fine by me - the class could actually be defined to take a module
name and attribute (e.g. 'sys', 'path' or 'foo', '__path__'), and then
there'd be no need to special case anything: it would behave exactly the
same way for subpackages and top-level packages.

> This leaves the magic __path__ objects for namespace modules, which I
> could live with, as long as their repr was not the same as a list, and
> assuming a good rationale is given. Although I'd still prefer plain
> lists here as well; I'd like to be able to manually construct a
> namespace package and force its directories to be a specific set of
> directories that I happen to know about, regardless of whether they
> are related to sys.path or not. And I'd like to know that my setup in
> that case would not be disturbed by changes to sys.path.
>

To do that, you just assign to __path__, the same as now, ala __path__ =
pkgutil.extend_path().  The auto-updating is in the initially-assigned
__path__ object, not the module object or some sort of generalized magic.

 I'd like to hear more about this from Philip -- is that feature
> actually widely used?

Well, it's built into setuptools, so yes.  ;-)  It gets used any time a
dynamically specified dependency is used that might contain a namespace
package.  This means, for example, that every setup script out there using
"setup.py test", every project using certain paste.deploy features...  it's
really difficult to spell out the scope of things that are using this, in
the context of setuptools and distribute, because there are an immense
number of ways to indirectly rely on it.

This doesn't mean that the feature can't continue to be implemented inside
setuptools' dynamic dependency system, but the code to do it in setuptools
is MUCH more complicated than the PEP 420 code, and doesn't work if you
manually add something to sys.path without asking setuptools to do it.
It's also somewhat timing-sensitive, depending on when and whether you
import 'site' and pkg_resources, and whether you are mixing eggs and
non-eggs in your namespace packages.

In short, the implementation is a huge mess that the PEP 420 approach would
vastly simplify.

But...  that wasn't the original reason why I proposed it.  The original
reason was simply that it makes namespace packages act more like the
equivalents do in other languages.  While being able to override __path__
can be considered a feature of Python, its being static by default is NOT a
feature, in the same way that *requiring* an __init__.py is not really a
feature.

The principle of least surprise says (at least IMO) that if you add a
directory to sys.path, you should be able to import stuff from it.  That
whether it works depends on whether or not you already imported part of a
namespace package earlier is both surprising and confusing.  (More on this
below.)

> What would a package have to do if the feature didn't exist?

Continue to depend on setuptools to do it for them, or use some
hypothetical update API...   but that's not really the right question.  ;-)

The right question is, what happens to package *users* if the feature
didn't exist?

And the answer to that question is, "you must call this hypothetical update
API *every time* you change sys.path, because otherwise your imports might
break, depending on whether or not some other package imported something
from a namespace before you changed sys.path".

And of course, you also need to make sure that any third-party code you use
does this too, if it adds something to sys.path for you.

And if you're writing cross-Python-version code, you need to check to make
sure whether the API is actually available.

And if you're someone helping Python newbies, you need to add this to your
list of debugging questions for import-related problems.

And remember: if you forget to do this, it might not break now.  It'll
break later, when you add that other plugin or update that random module
that dynamically decides to import something that just happens to be in a
namespace package, so be prepared for it to break your application in the
field, when an end-user is using it with a collection of plugins that you
haven't tested together, or in the same import sequence...

The people using setuptools won't have these problems, but *new* Python
users will, as people begin using a PEP 420 that lacks this feature.

The key scope question, I think, is: "How often do programs change sys.path
at runtime, and what have they imported up to that point?"  (Because for
the other part of the scope, I think it's a fairly safe bet that namespace
packages are going to become even *more* popular than they are now, once
PEP 420 is in place.)

But the key API/usability question is: "What's the One Obvious Way to
add/change what's importable?"

And I believe the answer to that question is, "change sys.path", not
"change sys.path, and then import some other module to call another API to
say, 'yes, I really *meant* to update sys.path, thank you very much.'"

(Especially since NOT requiring that extra API isn't going to break any
existing code.)

> I'd really much rather not have this feature, which
> reeks of too  much magic to me. (An area where Philip and I often
> disagree. :-)
>

My take on it is that it only SEEMS like magic, because we're used to
static __path__.  But other languages don't have per-package __path__ in
the first place, so there's nothing to "automatically update", and so it's
not magic at all that other subpackages/modules can be found when the
system path changes!

So, under the PEP 420 approach, it's *static* __path__ that's really the
weird special case, and should be considered so.  (After all, __path__ is
and was primarily an implementation optimization and compatibility hack,
rather than a user-facing "feature" of the import system.)

For example, when *would* you want to explicitly spell out a namespace
package __path__, and restrict it from seeing sys.path changes?  I've not
seen *anybody* ask for this feature in the context of setuptools; it's only
ever been bug reports about when the more complicated implementation fails
to detect an update.

So, to wrap up:

* The primary rationale for the feature is that "least surprise" for a new
user to Python is that adding to sys.path should allow importing a portion
of a namespace, whether or not you've already imported some other thing in
that namespace.  Symmetry with other languages and with other Python
features (e.g. changing the working directory in an interactive
interpreter) suggests it, and the removal of a similar timing dependency
from PEP 402 (preventing direct import of a namespace-only package unless
you imported a subpackage first) suggests that the same type of timing
dependency should be removed here, too.  (Note, for example, that I may not
know that importing baz.spam indirectly causes some part of foo.wiz to be
imported, and that if I then add another directory to sys.path containing a
foo.* portion, my code will *no longer work* when I try to import foo.ham.
This is much more "magical" behavior, in least-surprise terms!)

* The constraints on sys.path and package __path__ objects can and should
be removed, by making the dynamic path objects refer to a module and
attribute, instead of directly referencing parent __path__ objects.  Code
that currently manipulates __path__ will not break, because such code will
not be using PEP 420 namespace packages anyway (and so, __path__ will be a
list.  (Even so, the most common __path__ manipulation idiom is "__path__ =
pkgutil.extend_path(...)" anyway!)

* Namespace packages are a widely used feature of setuptools, and AFAIK
nobody has *ever* asked to stop dynamic additions to namespace __path__,
but a wide assortment of things people do with setuptools rely on dynamic
additions under the hood.  Providing the feature in PEP 420 gives a
migration path away from setuptools, at least for this one feature.
(Specifically, it does away with the need to use declare_namespace(), and
the need to do all sys.path manipulation via setuptools' requirements API.)

* Self-contained (__init__.py packages) and fixed __path__ lists can and
should be considered the "magic" or "special case" parts of importing in
Python 3, even though we're accustomed to them being central import
concepts in Python 2.  Modules and namespace packages can and should be the
default case from an instructional POV, and sys.path updating should
reflect this.  (That is, future tutorials should introduce modules, then
namespace packages, and finally self-contained packages with __init__ and
__path__, because the *idea* of a namespace package doesn't depend on
__path__ existing in the first place; it's essentially only a historical
accident that self-contained packages were implemented in Python first.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120521/9da15315/attachment.html>