[Import-SIG] PEP proposal: Per-Module Import Path
brett at python.org
Fri Jul 19 16:51:05 CEST 2013
If this can lead to the deprecation of .pth files then I support the idea,
but I think there are technical issues in terms of implementation that have
not been throught through yet. This is going to require an implementation
(even if it isn't in importlib._bootstrap but as a subclass of
importlib.machinery.FileFinder or something) to see how you plan to make
all of this work before this PEP can move beyond this SIG.
On Thu, Jul 18, 2013 at 6:10 PM, Eric Snow <ericsnowcurrently at gmail.com>wrote:
> Nick talked me into writing this PEP, so blame him for the idea. <wink> I
> haven't had a chance to polish it up, but the content communicates the
> proposal well enough to post here. Let me know what you think. Once some
> concensus is reached I'll commit the PEP and post to python-dev. I have a
> rough implementation that'll I'll put online when I get a chance.
> If Guido is not interested maybe Brett would like to be BDFL-Delegate. :)
> PEP: 4XX
> Title: Per-Module Import Path
> Version: $Revision$
> Last-Modified: $Date$
> Author: Eric Snow <ericsnowcurrently at gmail.com>
> Nick Coghlan <ncoghlan at gmail.com>
> BDFL-Delegate: ???
> Discussions-To: import-sig at python.org
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 17-Jul-2013
> Python-Version: 3.4
> Post-History: 18-Jul-2013
> Path-based import of a module or package involves traversing ``sys.path``
> or a package path to locate the appropriate file or directory(s).
> Redirecting from there to other locations is useful for packaging and
> for virtual environments. However, in practice such redirection is
> currently either `limited or fragile <Existing Alternatives>`_.
> This proposal provides a simple filesystem-based method to redirect from
> the normal module search path to other locations recognized by the
> import system. This involves one change to path-based imports, adds one
> import-related file type, and introduces a new module attribute. One
> consequence of this PEP is the deprecation of ``.pth`` files.
> One of the problems with virtual environments is that you are likely to
> end up with duplicate installations of lots of common packages, and
> keeping them up to date can be a pain.
> One of the problems with archive-based distribution is that it can be
You say "One of the problems" at the start of 3/4 of the paragraphs in this
section. Variety is the spice of life. =) Try "Another problem is that for
> tricky to register the archive as a Python path entry when needed
> without polluting the path of applications that don't need it.
How is this unique to archive-based distributions compared to any other
scenario where all distributions are blindly added to sys.path?
> One of the problems with working directly from a source checkout is
> getting the relevant source directories onto the Python path, especially
> when you have multiple namespace package fragments spread across several
> subdirectories of a large repository.
E.g., a source checkout for the coverage.py project might be stored in the
directory ``coveragepy``, but the actual source code is stored in
``coveragepy/coverage``, requiring ``coveragepy`` to be on sys.path in
order to access the package.
> The `current solutions <Existing Alternatives>`_ all have their flaws.
> Reference files are intended to address those deficiencies.
> Change to the Import System
> Currently, during `path-based import` of a module, the following happens
> for each `path entry` of `sys.path` or of the `__path__` of the module's
> 1. look for `<path entry>/<name>/__init__.py` (and other supported
> * return `loader`;
> 2. look for `<path entry>/<name>.py` (and other supported suffixes),
> * return loader;
> 3. look for `<path entry>/<name>/`,
> * extend namespace portions path.
Please capitalize the first letter of each bullet point (here and the rest
of the PEP). Reads better since they are each separate sentences.
> Once the path is exhausted, if no `loader` was found and the `namespace
> portions` path is non-empty, then a `NamespaceLoader` is returned with that
> This proposal inserts a step before step 1 for each `path entry`:
> 0. look for `<path entry>/<name>.ref`
Why .ref? Why not .path?
> a. get loader for `<fullname>` (absolute module name) using path found
> in `.ref` file (see below) using `the normal mechanism`[link to language
> * stop processing the path entry if `.ref` file is empty;
You should clarify how you plan to "get loader". You will have to find the
proper finder as well in case the .ref file references a zip file or
something which requires a different finder than the one which came across
the .ref file.
> b. check for `NamespaceLoader`,
> * extend namespace portions path;
> c. otherwise, return loader.
> Note the following consequences:
> * if a ref file is found, it takes precedence over module files and
> package directories under the same path entry (see `Empty Ref Files as
> * that holds for empty ref files also;
> * the loader for a ref file, if any, comes from the full import system
> (i.e. `sys.meta_path`) rather than just the path-based import system;
> * `.ref` files can indirectly provide fragments for namespace packages.
This ramification for namespace packages make the changed semantic proposal
a bit trickier than you are suggesting since you are essentially doing
recursive path entry search. And is that possible? If I have a .ref file
that refers to a path which itself has a .ref will that then lead to
another search? I mean it seems like you going to be doing ``return
importlib.find_loader(fullname, paths_found_in_ref) if paths_found_in_ref
else None, `` from within a finder which finds a .ref file, which itself
would support a recursive search.
Everything below should come before the import changes. It's hard to follow
what is really be proposed for semantics without knowing e.g. .ref files
can have 0 or more paths and just a single path, etc.
> Reference Files
> A new kind of file will live alongside package directories and module
> source files: reference files. These files have the following
> * named `<module name>.ref` in contrast to `<module name>.py` (etc.) or
> `<module name>/`;
> * placed under `sys.path` entries or package path (just like modules and
> Reference File Format
> The contents of a reference file will conform to the following format:
> * contain zero or more path entries, just like sys.path;
> * one path entry per line;
> * path entry order is preserved;
> * may contain comment lines starting with "#", which are ignored;
> * may contain blank lines, which are ignored;
> * must be UTF-8 encoded.
> Directory Path Entries
> Directory names are by far the most common type of path entry. Here is
> how they are constrained in reference files:
> * may be absolute or relative;
> * must be forward slash separated regardless of platform;
> * each must be the parent directory where the module will be looked for.
> To be clear, reference files (just like `sys.path`) deliberately reference
> the *parent* directory to be searched (rather than the module or package
> directory). So they work transparently with `__pycache__` and allow
> searching for `.dist-info <PEP 376>`_ directories through them.
> Relative directory names will be resolved based on the directory
> containing the ref file, rather than the current working directory.
> Allowing relative directory names allows you to include sensible ref files
> in a source repo.
> Empty Ref Files as Markers
> Handling `.ref` files first allows for the use of empty ref files as
> markers to indicate "this is not the module you are looking for". Here are
> two situations where that helps.
"Here" -> "There"
> First, an empty ref file helps resolve conflicts between script names and
> package names. When the interpreter is started with a filename, the
> directory of that script is added to the front of `sys.path`. This may be
> a problem for later imports where the intended module or package is on a
> regular path entry.
> If an import references the script's name, the file will get run again by
> the import system as a module (only `__main__` was added to `sys.modules`
> earlier) [PEP 395]_. This is a further problem if you meant to import a
> module or package in another path entry.
> The presence of an empty ref file in the script's directory would
> essentially render it invisible to the import system. This problem and
> solution apply for all of the files or directories in the script's
> Second, the namespace package mechanism has a side-effect: a directory
> without a __init__.py may be incorrectly treated as a namespace package
> fragment. The presence of an empty ref file indicates such a directory
> should be ignored.
> A Module Attribute to Expose Contributing Ref Files
> Knowing the origin of a module is important when tracking down problems,
> particularly import-related ones. Currently, that entails looking at
> `<module>.__file__` and `<module.__package__>.__path__` (or `sys.path`).
> With this PEP there can be a chain of ref files in between the currently
> available path and a module's __file__. Having access to that list of ref
> files is important in order to determine why one file was selected over
> another as the origin for the module. When an unexpected file gets used
> for one of your imports, you'll care about this!
> In order to facilitate that, modules will have a new attribute:
> `__indirect__`. It will be a tuple comprised of the chain of ref files, in
> order, used to locate the module's __file__. An empty tuple or with one
> item will be the most common case. An empty tuple indicates that no ref
> files were used to locate the module.
This complicates things even further. How are you going to pass this info
along a call chain through find_loader()? Are we going to have to add
find_loader3() to support this (nasty side-effect of using tuples instead
of types.SimpleNamespace for the return value)? Some magic second value or
type from find_loader() which flags the values in the iterable are from a
.ref file and not any other possible place? This requires an API change and
there isn't any mention of how that would look or work.
> XXX are these useful?
Yes if you change this to pip or setuptools and and also make it so it
shows how you could point to version-specific distributions.
> Top-level module (`import spam`)::
> # use the system installed module
> Submodule (`python -m myproject.tests`)::
> Multiple Path Entries::
> # fall back to the old one
> Chained Ref Files::
> # use the system installed module
> # use the clone
> Reference Implementation
> A reference implementation is available at <TBD>.
> XXX double-check zipimport support
> Deprecation of .pth Files
> The `site` module facilitates the composition of `sys.path`. As part of
> that, `.pth` files are processed and entries added to `sys.path`. Ref
> files are intended as a replacement.
> XXX also deprecate .pkg files (see pkgutil.extend_path())?
> Consequently, `.pth` files will be deprecated.
Link to "Existing Alternatives" discussion as to why this deprecation is
> Deprecation Schedule
> 1. documented: 3.4,
> 2. warnings: 3.5 and 3.6,
> 3. removal: 3.7
> XXX Deprecate sooner?
> Existing Alternatives
> .pth Files
> `*.pth` files have the problem that they're global: if you add them to
> `site-packages`, they will be processed at startup by *every* Python
> application run using that Python installation.
"... thanks to them being processed by the site module instead of by the
import system and individual finders."
> This is an undesirable side effect of the way `*.pth` processing is
> defined, but can't be changed due to backwards compatibility issues.
> Furthermore, `*.pth` files are processed at interpreter startup...
That's a moot point; .ref files can be as well if they are triggered as
part of an import.
A bigger concern is that they execute arbitrary Python code which could be
viewed as an unexpected security risk. Some might complain about the
difficulty then of loading non-standard importers, but that really should
be the duty of the code performing the import and not the distribution
itself; IOW I would argue that it is up to the user to get things in line
to use a distribution in the format they choose to use it instead of the
distribution dictating how it should be bundled.
> .egg-link files
> `*.egg-link` files are much closer to the proposed `*.ref` files. The
> difference is that `*.egg-link` files are designed to work with
> `pkg_resources` and `distribution names`, while `*.ref files` are designed
> to work with package and module names as an automatic part of the import
> Actual symlinks have the problem that they aren't really practical on
> Windows, and also that they don't support non-versioned references to
> versioned `dist-info` directories.
> Design Alternatives
> Ignore Empty Ref Files
> An empty ref file would be ignored rather than effectively stopping the
> processing of the path entry. This loses the benefits outlined above of
> empty ref files as markers.
> ImportError for Empty Ref Files
> An empty ref file would result in an ImportError. The only benefit to
> this would be to disallow empty ref files and make it clear when they are
> Handle Ref Files After Namespace Packages
> Rather than handling ref files first, they could be handled last. Thus
> they would have lower priority than namespace package fragments. This
> would be insignificantly more backward compatible. However, as with
> ignoring empty ref files, handling them last would prevent their use as
> markers for ignoring a path entry.
> Send Ref File Path Through Path Import System Only
> As indicated above, the path entries in a ref file are passed back through
> the metapath finders to get the loader. Instead we could use just the
> path-based import system. This would prevent metapath finders from having
> a chance to handle the module under a different path.
> Restrict Ref File Path Entries to Directories
> Rather than allowing anything for the path entries in a ref file, they
> could be restricted to just directories. This is by far the common case.
> However, it would add complexity without any justification for not
> allowing metapath importers a chance at the module under a new path.
> Restrict Directories in Ref File Path Entries to Absolute
> Directory path entries in ref files can be relative or absolute. Limiting
> to just absolute directory names would be an artificial change to existing
> constraints on path entries without any justification. Furthermore, it
> would prevent simple use of ref files in code bases relative to project
> Future Extensions
> Longer term, we should also allow *versioned* `*.ref` files that can be
> used to reference modules and packages that aren't available for ordinary
> import (since they don't follow the "name.ref" format), but are available
> to tools like `pkg_resources` to handle parallel installs of different
> ..  ...
> This document has been placed in the public domain.
> Local Variables:
> mode: indented-text
> indent-tabs-mode: nil
> sentence-end-double-space: t
> fill-column: 70
> coding: utf-8
> Import-SIG mailing list
> Import-SIG at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Import-SIG