[Import-SIG] PEP proposal: Per-Module Import Path

Sat Jul 20 15:55:00 CEST 2013

On Sat, Jul 20, 2013 at 3:32 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 20 July 2013 00:51, Brett Cannon <brett at python.org> wrote:
> >> tricky to register the archive as a Python path entry when needed
> >> without polluting the path of applications that don't need it.
> >
> > How is this unique to archive-based distributions compared to any other
> > scenario where all distributions are blindly added to sys.path?
>
> Other packages and modules are just *on* an already existing sys.path
> entry. They're available, but unless you import them, they just sit
> there on disk and don't bother you.
>
> *.pth files in site-packages are different: *every* application run on
> that Python installation (without -S) will have *new* entries added to
> sys.path :(
>
> >> One of the problems with working directly from a source checkout is
> >> getting the relevant source directories onto the Python path, especially
> >> when you have multiple namespace package fragments spread across several
> >> subdirectories of a large repository.
> >>
> >
> > E.g., a source checkout for the coverage.py project might be stored in
> the
> > directory ``coveragepy``, but the actual source code is stored in
> > ``coveragepy/coverage``, requiring ``coveragepy`` to be on sys.path in
> order
> > to access the package.
>
> Yep, exactly. With the PEP, you should be able to just do "echo
> coveragepy > coverage.ref" in the current directory and the
> interpreter will be able to follow the reference when you do "import
> coverage", but it won't actually be added to sys.path.
>
> A more complex example is the source layout for Beaker, where we have
> separate Common, Client, Server, LabController and IntegrationTest
> source directories that all contribute submodules to a shared "bkr"
> namespace package. Getting those all on your path in a source checkout
> currently means plenty of PYTHONPATH manipulation and various helper
> scripts.
>
> Because it's a namespace package, we can't do this with symlinks - a
> "bkr" symlink could only reference one of the source directories, and
> we have five. With ref files (and using Python 3.4+, which may happen
> some day!), we'd be able to designate a recommended working directory
> and put a single "bkr.ref" file under source control that included the
> appropriate relative paths to make all the components available.
>
> >> Once the path is exhausted, if no `loader` was found and the `namespace
> >> portions` path is non-empty, then a `NamespaceLoader` is returned with
> that
> >> path.
> >>
> >> This proposal inserts a step before step 1 for each `path entry`:
> >>
> >> 0. look for `<path entry>/<name>.ref`
> >
> >
> > Why .ref? Why not .path?
>
> I think of this proposal in general as "indirect imports", but the
> notion of a "path reference" is what inspired the specific extension
> (Eric just copied it from my original email).
>
> I didn't actually consider other possible extensions (since I was
> happy with .ref and it was the first one I thought of), but I think
> .path in particular would be awfully confusing, since I (and others)
> already refer to .pth files as "dot path" files.
>
>
> >
> > Everything below should come before the import changes. It's hard to
> follow
> > what is really be proposed for semantics without  knowing e.g. .ref files
> > can have 0 or more paths and just a single path, etc.
>
> The proposal is indeed for what is effectively full blown recursive
> path search, starting again from the top with sys.meta_path :)
>
> The only new trick we should need is something to track which path
> entries we have already considered so we can prevent recursive loops.
>
> I don't know how Eric handled it in his draft implementation, but
> here's the kind of thing I would do:
>
> 1. Define an IndirectReference type that can be returned instead of a
> loader
>

Type-dependent logic is always tough to get passed python-dev. And
obviously this is special to sys.meta_path (ignoring __indirect__).

> 2. FileImporter would be aware of the new type, and if it sees it
> instead of a loader:
>
> - extracts the name of the reference file (to append to __indirect__)
> - extracts the subpath (for the recursive descent)
> - removes any previously seen path segments (including any from the
> original path)
> - if there are paths left, triggers the recursive search for a loader
> (I was thinking at the sys.path_hooks level, but Eric suggested using
> sys.meta_path instead)
>

I was thinking sys.path as well because otherwise it becomes much more
complicated. If you leave out the whole __indirect__ bit you are simply
using .ref files to call find_loader() and then either accumulating the
namespace packages that are returned or an actual loader. But with
sys.meta_path you drop out of namespace package world and are one level up
where the ability to handle that accumulation of paths goes away (thus your
IndirectReference idea).

> - treat the result of the recursive search the same as you would a
> search of a single path segment
>
> I'm not currently sure what I would do about __indirect__ if the
> result was a namespace package. In that case, you really want it to be
> a mapping from path entries to the indirections that located them (if
> any), which suggests we may want it to be a mapping whenever
> indirection occurs (from __file__ to the indirection chain for the
> simple case), or None if no indirection was involved in loading the
> module.
>

Could you do it post-import? I mean the process of finding the .ref files
is the same, so you could just look for the files, accumulate the list, and
then see which ones ended up on __path__ and see "this is where these came
from". That does away with any potentially nasty API changes to make
__indirect__ work.

>
> You couldn't have the loader handle the recursion itself, or you'd
> have trouble getting state from the previous loader to the next one
> down. Thus the recursive->iterative transformation through the
> "IndirectReference" type.
>
> (Again, though, this is just me thinking out loud - Eric's the one
> that created an actual draft implementation)
>
>
> >> In order to facilitate that, modules will have a new attribute:
> >> `__indirect__`.  It will be a tuple comprised of the chain of ref
> files, in
> >> order, used to locate the module's __file__.  An empty tuple or with one
> >> item will be the most common case.  An empty tuple indicates that no ref
> >> files were used to locate the module.
> >
> > This complicates things even further. How are you going to pass this info
> > along a call chain through find_loader()? Are we going to have to add
> > find_loader3() to support this (nasty side-effect of using tuples
> instead of
> > types.SimpleNamespace for the return value)? Some magic second value or
> type
> > from find_loader() which flags the values in the iterable are from a .ref
> > file and not any other possible place? This requires an API change and
> there
> > isn't any mention of how that would look or work.
>
> Good question. This was a last minute addition just before Eric posted
> the draft. I still think it's something we should try to offer, and I
> suspect whatever mechanism is put in place to prevent recursive loops
> should be able to handle propagating this information (as described in
> my sketch above).
>

Sure, I'm not saying that it wouldn't be useful, I'm just wondering how it
would be pulled off without yet another change in the finder API (I'm
really coming to wish we returned a types.SimpleNamespace instead of a
tuple for find_loader()).

-Brett

>
> >> This is an undesirable side effect of the way `*.pth` processing is
> >> defined, but can't be changed due to backwards compatibility issues.
> >>
> >> Furthermore, `*.pth` files are processed at interpreter startup...
> >
> >
> > That's a moot point; .ref files can be as well if they are triggered as
> part
> > of an import.
>
> The difference is that ref files will only be triggered for modules
> you actually import. *Every* .pth file in site-packages is processed
> at interpreter startup, with startup time implications for all Python
> code run on that system.
>
> > A bigger concern is that they execute arbitrary Python code which could
> be
> > viewed as an unexpected security risk. Some might complain about the
> > difficulty then of loading non-standard importers, but that really
> should be
> > the duty of the code  performing the import and not the distribution
> itself;
> > IOW I would argue that it is up to the user to get things in line to use
> a
> > distribution in the format they choose to use it instead of the
> distribution
> > dictating how it should be bundled.
>
> Agreed, we should mention this (it's one of the reason Linux distros
> aren't fond of *.pth files).
>
> We should also note that, unlike *.pth files, *.ref files would work
> even with the "-S" switch, since they don't rely on the site module
> making additions to sys.path.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130720/ca2c9a90/attachment.html>