[Import-SIG] PEP proposal: Per-Module Import Path

Fri Jul 19 16:51:05 CEST 2013

If this can lead to the deprecation of .pth files then I support the idea,
but I think there are technical issues in terms of implementation that have
not been throught through yet. This is going to require an implementation
(even if it isn't in importlib._bootstrap but as a subclass of
importlib.machinery.FileFinder or something) to see how you plan to make
all of this work before this PEP can move beyond this SIG.

On Thu, Jul 18, 2013 at 6:10 PM, Eric Snow <ericsnowcurrently at gmail.com>wrote:

> Hi,
>
> Nick talked me into writing this PEP, so blame him for the idea. <wink>  I
> haven't had a chance to polish it up, but the content communicates the
> proposal well enough to post here.  Let me know what you think.  Once some
> concensus is reached I'll commit the PEP and post to python-dev.  I have a
> rough implementation that'll I'll put online when I get a chance.
>
> If Guido is not interested maybe Brett would like to be BDFL-Delegate. :)
>
> -eric
>
>
> PEP: 4XX
> Title: Per-Module Import Path
> Version: $Revision$
> Last-Modified: $Date$
> Author: Eric Snow <ericsnowcurrently at gmail.com>
>         Nick Coghlan <ncoghlan at gmail.com>
> BDFL-Delegate: ???
> Discussions-To: import-sig at python.org
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 17-Jul-2013
> Python-Version: 3.4
> Post-History: 18-Jul-2013
> Resolution:
>
>
> Abstract
> =======
>
> Path-based import of a module or package involves traversing ``sys.path``
> or a package path to locate the appropriate file or directory(s).
> Redirecting from there to other locations is useful for packaging and
> for virtual environments.  However, in practice such redirection is
> currently either `limited or fragile <Existing Alternatives>`_.
>
> This proposal provides a simple filesystem-based method to redirect from
> the normal module search path to other locations recognized by the
> import system.  This involves one change to path-based imports, adds one
> import-related file type, and introduces a new module attribute.  One
> consequence of this PEP is the deprecation of ``.pth`` files.
>
>
> Motivation
> =========
>
> One of the problems with virtual environments is that you are likely to
> end up with duplicate installations of lots of common packages, and
> keeping them up to date can be a pain.
>
> One of the problems with archive-based distribution is that it can be
>

You say "One of the problems" at the start of 3/4 of the paragraphs in this
section. Variety is the spice of life. =) Try "Another problem is that for
archive-based", etc.

> tricky to register the archive as a Python path entry when needed
> without polluting the path of applications that don't need it.
>

How is this unique to archive-based distributions compared to any other
scenario where all distributions are blindly added to sys.path?

>
> One of the problems with working directly from a source checkout is
> getting the relevant source directories onto the Python path, especially
> when you have multiple namespace package fragments spread across several
> subdirectories of a large repository.
>
>
E.g., a source checkout for the coverage.py project might be stored in the
directory ``coveragepy``, but the actual source code is stored in
``coveragepy/coverage``, requiring ``coveragepy`` to be on sys.path in
order to access the package.

> The `current solutions <Existing Alternatives>`_ all have their flaws.
> Reference files are intended to address those deficiencies.
>
>
> Specification
> ===========
>
> Change to the Import System
> -----------------------------
>
> Currently, during `path-based import` of a module, the following happens
> for each `path entry` of `sys.path` or of the `__path__` of the module's
> parent:
>
> 1. look for `<path entry>/<name>/__init__.py` (and other supported
> suffixes),
>   * return `loader`;
> 2. look for `<path entry>/<name>.py` (and other supported suffixes),
>   * return loader;
> 3. look for `<path entry>/<name>/`,
>   * extend namespace portions path.
>
>
Please capitalize the first letter of each bullet point (here and the rest
of the PEP). Reads better since they are each separate sentences.

> Once the path is exhausted, if no `loader` was found and the `namespace
> portions` path is non-empty, then a `NamespaceLoader` is returned with that
> path.
>
> This proposal inserts a step before step 1 for each `path entry`:
>
> 0. look for `<path entry>/<name>.ref`
>

Why .ref? Why not .path?

>   a. get loader for `<fullname>` (absolute module name) using path found
> in `.ref` file (see below) using `the normal mechanism`[link to language
> reference],
>     * stop processing the path entry if `.ref` file is empty;
>

You should clarify how you plan to "get loader". You will have to find the
proper finder as well in case the .ref file references a zip file or
something which requires a different finder than the one which came across
the .ref file.

>   b. check for `NamespaceLoader`,
>     * extend namespace portions path;
>   c. otherwise, return loader.
>
> Note the following consequences:
>
> * if a ref file is found, it takes precedence over module files and
> package directories under the same path entry (see `Empty Ref Files as
> Markers`_);
> * that holds for empty ref files also;
> * the loader for a ref file, if any, comes from the full import system
> (i.e. `sys.meta_path`) rather than just the path-based import system;
> * `.ref` files can indirectly provide fragments for namespace packages.
>

This ramification for namespace packages make the changed semantic proposal
a bit trickier than you are suggesting since you are essentially doing
recursive path entry search. And is that possible? If I have a .ref file
that refers to a path which itself has a .ref will that then lead to
another search? I mean it seems like you going to be doing ``return
importlib.find_loader(fullname, paths_found_in_ref) if paths_found_in_ref
else None, []`` from within a finder which finds a .ref file, which itself
would support a recursive search.

Everything below should come before the import changes. It's hard to follow
what is really be proposed for semantics without  knowing e.g. .ref files
can have 0 or more paths and just a single path, etc.

> Reference Files
> ---------------
>
> A new kind of file will live alongside package directories and module
> source files: reference files.  These files have the following
> characteristics:
>
> * named `<module name>.ref` in contrast to `<module name>.py` (etc.) or
> `<module name>/`;
> * placed under `sys.path` entries or package path (just like modules and
> packages).
>
> Reference File Format
> ----------------------
>
> The contents of a reference file will conform to the following format:
>
> * contain zero or more path entries, just like sys.path;
> * one path entry per line;
> * path entry order is preserved;
> * may contain comment lines starting with "#", which are ignored;
> * may contain blank lines, which are ignored;
> * must be UTF-8 encoded.
>
> Directory Path Entries
> ----------------------
>
> Directory names are by far the most common type of path entry.  Here is
> how they are constrained in reference files:
>
> * may be absolute or relative;
> * must be forward slash separated regardless of platform;
> * each must be the parent directory where the module will be looked for.
>
> To be clear, reference files (just like `sys.path`) deliberately reference
> the *parent* directory to be searched (rather than the module or package
> directory).  So they work transparently with `__pycache__` and allow
> searching for `.dist-info <PEP 376>`_ directories through them.
>
> Relative directory names will be resolved based on the directory
> containing the ref file, rather than the current working directory.
>  Allowing relative directory names allows you to include sensible ref files
> in a source repo.
>
> Empty Ref Files as Markers
> -----------------------------
>
> Handling `.ref` files first allows for the use of empty ref files as
> markers to indicate "this is not the module you are looking for".  Here are
> two situations where that helps.
>

"Here" -> "There"

>
> First, an empty ref file helps resolve conflicts between script names and
> package names.  When the interpreter is started with a filename, the
> directory of that script is added to the front of `sys.path`.  This may be
> a problem for later imports where the intended module or package is on a
> regular path entry.
>
> If an import references the script's name, the file will get run again by
> the import system as a module (only `__main__` was added to `sys.modules`
> earlier) [PEP 395]_.  This is a further problem if you meant to import a
> module or package in another path entry.
>
> The presence of an empty ref file in the script's directory would
> essentially render it invisible to the import system.  This problem and
> solution apply for all of the files or directories in the script's
> directory.
>
> Second, the namespace package mechanism has a side-effect: a directory
> without a __init__.py may be incorrectly treated as a namespace package
> fragment.  The presence of an empty ref file indicates such a directory
> should be ignored.
>
> A Module Attribute to Expose Contributing Ref Files
> ---------------------------------------------
>
> Knowing the origin of a module is important when tracking down problems,
> particularly import-related ones.  Currently, that entails looking at
> `<module>.__file__` and `<module.__package__>.__path__` (or `sys.path`).
>
> With this PEP there can be a chain of ref files in between the currently
> available path and a module's __file__.  Having access to that list of ref
> files is important in order to determine why one file was selected over
> another as the origin for the module.  When an unexpected file gets used
> for one of your imports, you'll care about this!
>
> In order to facilitate that, modules will have a new attribute:
> `__indirect__`.  It will be a tuple comprised of the chain of ref files, in
> order, used to locate the module's __file__.  An empty tuple or with one
> item will be the most common case.  An empty tuple indicates that no ref
> files were used to locate the module.
>

This complicates things even further. How are you going to pass this info
along a call chain through find_loader()? Are we going to have to add
find_loader3() to support this (nasty side-effect of using tuples instead
of types.SimpleNamespace for the return value)? Some magic second value or
type from find_loader() which flags the values in the iterable are from a
.ref file and not any other possible place? This requires an API change and
there isn't any mention of how that would look or work.

>
> Examples
> --------
>
> XXX are these useful?
>

Yes if you change this to pip or setuptools and and also make it so it
shows how you could point to version-specific distributions.

>
> Top-level module (`import spam`)::
>
>   ~/venvs/ham/python/site-packages/
>       spam.ref
>
>   spam.ref:
>       # use the system installed module
>       /python/site-packages
>
>   /python/site-packages:
>       spam.py
>
>   spam.__file__:
>       "/python/site-packages/spam.py"
>
>   spam.__indirect__:
>       ("~/venvs/ham/python/site-packages/spam.ref",)
>
> Submodule (`python -m myproject.tests`)::
>
>   ~/myproject/
>       setup.py
>       tests/
>           __init__.py
>           __main__.py
>       myproject/
>           __init__.py
>           tests.ref
>
>   tests.ref:
>       ../
>
>   myproject.__indirect__:
>       ()
>
>   myproject.tests.__file__:
>       "~/myproject/tests/__init__.py"
>
>   myproject.tests.__indirect__:
>       ("~/myproject/myproject/tests.ref",)
>
> Multiple Path Entries::
>
>   myproj/
>       __init__.py
>       mod.ref
>
>   mod.ref:
>       # fall back to the old one
>       /python/site-packages/mod-new/
>       /python/site-packages/mod-old/
>
>   /python/site-packages/
>       mod-old/
>           mod.py
>
>   myproj.mod.__file__:
>       "/python/site-packages/mod-old/mod.py"
>
>   myproj.mod.__indirect__:
>       ("myproj/mod.ref",)
>
> Chained Ref Files::
>
>   venvs/ham/python/site-packages/
>       spam.ref
>
>   venvs/ham/python/site-packages/spam.ref:
>       # use the system installed module
>       /python/site-packages
>
>   /python/site-packages/
>       spam.ref
>
>   /python/site-packages/spam.ref:
>       # use the clone
>       ~/clones/myproj/
>
>   ~/clones/myproj/
>       spam.py
>
>   spam.__file__:
>       "~/clones/myproj/spam.py"
>
>   spam.__indirect__:
>       ("venvs/ham/python/site-packages/spam.ref",
> "/python/site-packages/spam.ref")
>
> Reference Implementation
> ------------------------
>
> A reference implementation is available at <TBD>.
>
> XXX double-check zipimport support
>
>
> Deprecation of .pth Files
> =============================
>
> The `site` module facilitates the composition of `sys.path`.  As part of
> that, `.pth` files are processed and entries added to `sys.path`.  Ref
> files are intended as a replacement.
>
> XXX also deprecate .pkg files (see pkgutil.extend_path())?
>
> Consequently, `.pth` files will be deprecated.
>

Link to "Existing Alternatives" discussion as to why this deprecation is
desired.

>
> Deprecation Schedule
> -------------------------
>
> 1. documented: 3.4,
> 2. warnings: 3.5 and 3.6,
> 3. removal: 3.7
>
> XXX Deprecate sooner?
>
>
> Existing Alternatives
> =================
>
> .pth Files
> ----------
>
> `*.pth` files have the problem that they're global: if you add them to
> `site-packages`, they will be processed at startup by *every* Python
> application run using that Python installation.
>

"... thanks to them being processed by the site module instead of by the
import system and individual finders."

> This is an undesirable side effect of the way `*.pth` processing is
> defined, but can't be changed due to backwards compatibility issues.
>
> Furthermore, `*.pth` files are processed at interpreter startup...
>

That's a moot point; .ref files can be as well if they are triggered as
part of an import.

A bigger concern is that they execute arbitrary Python code which could be
viewed as an unexpected security risk. Some might complain about the
difficulty then of loading non-standard importers, but that really should
be the duty of the code  performing the import and not the distribution
itself; IOW I would argue that it is up to the user to get things in line
to use a distribution in the format they choose to use it instead of the
distribution dictating how it should be bundled.

>
> .egg-link files
> --------------
>
> `*.egg-link` files are much closer to the proposed `*.ref` files. The
> difference is that `*.egg-link` files are designed to work with
> `pkg_resources` and `distribution names`, while `*.ref files` are designed
> to work with package and module names as an automatic part of the import
> system.
>
> Symlinks
> ---------
>
> Actual symlinks have the problem that they aren't really practical on
> Windows, and also that they don't support non-versioned references to
> versioned `dist-info` directories.
>
> Design Alternatives
> ===================
>
> Ignore Empty Ref Files
> ----------------------
>
> An empty ref file would be ignored rather than effectively stopping the
> processing of the path entry.  This loses the benefits outlined above of
> empty ref files as markers.
>
> ImportError for Empty Ref Files
> -------------------------------
>
> An empty ref file would result in an ImportError.  The only benefit to
> this would be to disallow empty ref files and make it clear when they are
> encountered.
>
> Handle Ref Files After Namespace Packages
> -----------------------------------------
>
> Rather than handling ref files first, they could be handled last.  Thus
> they would have lower priority than namespace package fragments.  This
> would be insignificantly more backward compatible.  However, as with
> ignoring empty ref files, handling them last would prevent their use as
> markers for ignoring a path entry.
>
> Send Ref File Path Through Path Import System Only
> --------------------------------------------------
>
> As indicated above, the path entries in a ref file are passed back through
> the metapath finders to get the loader.  Instead we could use just the
> path-based import system.  This would prevent metapath finders from having
> a chance to handle the module under a different path.
>
> Restrict Ref File Path Entries to Directories
> ---------------------------------------------
>
> Rather than allowing anything for the path entries in a ref file, they
> could be restricted to just directories.  This is by far the common case.
>  However, it would add complexity without any justification for not
> allowing metapath importers a chance at the module under a new path.
>
> Restrict Directories in Ref File Path Entries to Absolute
> ---------------------------------------------------------
>
> Directory path entries in ref files can be relative or absolute.  Limiting
> to just absolute directory names would be an artificial change to existing
> constraints on path entries without any justification.  Furthermore, it
> would prevent simple use of ref files in code bases relative to project
> roots.
>
>
> Future Extensions
> ===============
>
> Longer term, we should also allow *versioned* `*.ref` files that can be
> used to reference modules and packages that aren't available for ordinary
> import (since they don't follow the "name.ref" format), but are available
> to tools like `pkg_resources` to handle parallel installs of different
> versions.
>
>
> References
> ==========
>
> .. [0] ...
>        ()
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
> ^L
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
>
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130719/42891742/attachment-0001.html>