[Python-Dev] PEP 420 - dynamic path computation is missing rationale

Nick Coghlan ncoghlan at gmail.com
Tue May 22 01:25:52 CEST 2012


As a simple example to back up PJE's explanation, consider:
1.  encodings becomes a namespace package
2. It sometimes gets imported during interpreter startup to initialise the
standard io streams
3. An application modifies sys.path after startup and wants to contribute
additional encodings

Searching the entire parent path for new portions on every import would be
needlessly slow.

Not recognising new portions would be needlessly confusing for users. In
our simple case above, the application would fail if the io initialisation
accessed the encodings package, but work if it did not (e.g. when all
streams are utf-8).

PEP 420 splits the difference via an automatically invalidated cache: when
you iterate over a namespace package __path__ object, it rescans the parent
path for new portions *if and only if* the contents of the parent path have
changed since the previous scan.

Cheers,
Nick.

--
Sent from my phone, thus the relative brevity :)
On May 22, 2012 4:10 AM, "PJ Eby" <pje at telecommunity.com> wrote:

> On Mon, May 21, 2012 at 9:55 AM, Guido van Rossum <guido at python.org>wrote:
>
>> Ah, I see. But I disagree that this is a reasonable constraint on
>>  sys.path. The magic __path__ object of a toplevel namespace module
>> should know it is a toplevel module, and explicitly refetch sys.path
>> rather than just keeping around a copy.
>>
>
> That's fine by me - the class could actually be defined to take a module
> name and attribute (e.g. 'sys', 'path' or 'foo', '__path__'), and then
> there'd be no need to special case anything: it would behave exactly the
> same way for subpackages and top-level packages.
>
>
>> This leaves the magic __path__ objects for namespace modules, which I
>> could live with, as long as their repr was not the same as a list, and
>> assuming a good rationale is given. Although I'd still prefer plain
>> lists here as well; I'd like to be able to manually construct a
>> namespace package and force its directories to be a specific set of
>> directories that I happen to know about, regardless of whether they
>> are related to sys.path or not. And I'd like to know that my setup in
>> that case would not be disturbed by changes to sys.path.
>>
>
> To do that, you just assign to __path__, the same as now, ala __path__ =
> pkgutil.extend_path().  The auto-updating is in the initially-assigned
> __path__ object, not the module object or some sort of generalized magic.
>
>
>  I'd like to hear more about this from Philip -- is that feature
>> actually widely used?
>
>
> Well, it's built into setuptools, so yes.  ;-)  It gets used any time a
> dynamically specified dependency is used that might contain a namespace
> package.  This means, for example, that every setup script out there using
> "setup.py test", every project using certain paste.deploy features...  it's
> really difficult to spell out the scope of things that are using this, in
> the context of setuptools and distribute, because there are an immense
> number of ways to indirectly rely on it.
>
> This doesn't mean that the feature can't continue to be implemented inside
> setuptools' dynamic dependency system, but the code to do it in setuptools
> is MUCH more complicated than the PEP 420 code, and doesn't work if you
> manually add something to sys.path without asking setuptools to do it.
> It's also somewhat timing-sensitive, depending on when and whether you
> import 'site' and pkg_resources, and whether you are mixing eggs and
> non-eggs in your namespace packages.
>
> In short, the implementation is a huge mess that the PEP 420 approach
> would vastly simplify.
>
> But...  that wasn't the original reason why I proposed it.  The original
> reason was simply that it makes namespace packages act more like the
> equivalents do in other languages.  While being able to override __path__
> can be considered a feature of Python, its being static by default is NOT a
> feature, in the same way that *requiring* an __init__.py is not really a
> feature.
>
> The principle of least surprise says (at least IMO) that if you add a
> directory to sys.path, you should be able to import stuff from it.  That
> whether it works depends on whether or not you already imported part of a
> namespace package earlier is both surprising and confusing.  (More on this
> below.)
>
>
>
>> What would a package have to do if the feature didn't exist?
>
>
> Continue to depend on setuptools to do it for them, or use some
> hypothetical update API...   but that's not really the right question.  ;-)
>
> The right question is, what happens to package *users* if the feature
> didn't exist?
>
> And the answer to that question is, "you must call this hypothetical
> update API *every time* you change sys.path, because otherwise your imports
> might break, depending on whether or not some other package imported
> something from a namespace before you changed sys.path".
>
> And of course, you also need to make sure that any third-party code you
> use does this too, if it adds something to sys.path for you.
>
> And if you're writing cross-Python-version code, you need to check to make
> sure whether the API is actually available.
>
> And if you're someone helping Python newbies, you need to add this to your
> list of debugging questions for import-related problems.
>
> And remember: if you forget to do this, it might not break now.  It'll
> break later, when you add that other plugin or update that random module
> that dynamically decides to import something that just happens to be in a
> namespace package, so be prepared for it to break your application in the
> field, when an end-user is using it with a collection of plugins that you
> haven't tested together, or in the same import sequence...
>
> The people using setuptools won't have these problems, but *new* Python
> users will, as people begin using a PEP 420 that lacks this feature.
>
> The key scope question, I think, is: "How often do programs change
> sys.path at runtime, and what have they imported up to that point?"
> (Because for the other part of the scope, I think it's a fairly safe bet
> that namespace packages are going to become even *more* popular than they
> are now, once PEP 420 is in place.)
>
> But the key API/usability question is: "What's the One Obvious Way to
> add/change what's importable?"
>
> And I believe the answer to that question is, "change sys.path", not
> "change sys.path, and then import some other module to call another API to
> say, 'yes, I really *meant* to update sys.path, thank you very much.'"
>
> (Especially since NOT requiring that extra API isn't going to break any
> existing code.)
>
>
>
>> I'd really much rather not have this feature, which
>> reeks of too  much magic to me. (An area where Philip and I often
>> disagree. :-)
>>
>
> My take on it is that it only SEEMS like magic, because we're used to
> static __path__.  But other languages don't have per-package __path__ in
> the first place, so there's nothing to "automatically update", and so it's
> not magic at all that other subpackages/modules can be found when the
> system path changes!
>
> So, under the PEP 420 approach, it's *static* __path__ that's really the
> weird special case, and should be considered so.  (After all, __path__ is
> and was primarily an implementation optimization and compatibility hack,
> rather than a user-facing "feature" of the import system.)
>
> For example, when *would* you want to explicitly spell out a namespace
> package __path__, and restrict it from seeing sys.path changes?  I've not
> seen *anybody* ask for this feature in the context of setuptools; it's only
> ever been bug reports about when the more complicated implementation fails
> to detect an update.
>
> So, to wrap up:
>
> * The primary rationale for the feature is that "least surprise" for a new
> user to Python is that adding to sys.path should allow importing a portion
> of a namespace, whether or not you've already imported some other thing in
> that namespace.  Symmetry with other languages and with other Python
> features (e.g. changing the working directory in an interactive
> interpreter) suggests it, and the removal of a similar timing dependency
> from PEP 402 (preventing direct import of a namespace-only package unless
> you imported a subpackage first) suggests that the same type of timing
> dependency should be removed here, too.  (Note, for example, that I may not
> know that importing baz.spam indirectly causes some part of foo.wiz to be
> imported, and that if I then add another directory to sys.path containing a
> foo.* portion, my code will *no longer work* when I try to import foo.ham.
> This is much more "magical" behavior, in least-surprise terms!)
>
> * The constraints on sys.path and package __path__ objects can and should
> be removed, by making the dynamic path objects refer to a module and
> attribute, instead of directly referencing parent __path__ objects.  Code
> that currently manipulates __path__ will not break, because such code will
> not be using PEP 420 namespace packages anyway (and so, __path__ will be a
> list.  (Even so, the most common __path__ manipulation idiom is "__path__ =
> pkgutil.extend_path(...)" anyway!)
>
> * Namespace packages are a widely used feature of setuptools, and AFAIK
> nobody has *ever* asked to stop dynamic additions to namespace __path__,
> but a wide assortment of things people do with setuptools rely on dynamic
> additions under the hood.  Providing the feature in PEP 420 gives a
> migration path away from setuptools, at least for this one feature.
> (Specifically, it does away with the need to use declare_namespace(), and
> the need to do all sys.path manipulation via setuptools' requirements API.)
>
> * Self-contained (__init__.py packages) and fixed __path__ lists can and
> should be considered the "magic" or "special case" parts of importing in
> Python 3, even though we're accustomed to them being central import
> concepts in Python 2.  Modules and namespace packages can and should be the
> default case from an instructional POV, and sys.path updating should
> reflect this.  (That is, future tutorials should introduce modules, then
> namespace packages, and finally self-contained packages with __init__ and
> __path__, because the *idea* of a namespace package doesn't depend on
> __path__ existing in the first place; it's essentially only a historical
> accident that self-contained packages were implemented in Python first.)
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120522/d419d119/attachment-0001.html>


More information about the Python-Dev mailing list