[Distutils] status check on PEP 517

Nathaniel Smith njs at pobox.com
Wed Aug 2 01:11:40 EDT 2017

On Tue, Aug 1, 2017 at 9:31 AM, Thomas Kluyver <thomas at kluyver.me.uk> wrote:
> Are we content to say that sys.path includes the source directory where
> the hook is run? Shall I prepare a PR against the PEP for that?

It doesn't matter whether sys.path includes the source directory when
the hook is run. At that point the hook can manipulate sys.path
however it likes, just like setup.py can. What matters is whether
sys.path includes the source directory before the backend is imported.

And I think that even if we keep the source directory in sys.path,
then there's at least a further piece of refinement needed. The
obvious mechanism for adding the source directory to sys.path is by
setting PYTHONPATH before invoking the sub-python that runs the build
backend. That's how we put other directories on sys.path for injecting
requirements, and there's a bit of text in the PEP talking about how
the execution environment needs to be inherited over calls like
subprocess.run([sys.executable, ...]); if this requirement also
applies to the source tree's presence in sys.path, then we *have* to
put the source tree in PYTHONPATH. But... this creates some special

Putting the source directory in PYTHONPATH is different from the
familiar 'python setup.py' behavior. When you run 'python setup.py',
the sequence of events is:

1) The interpreter bootstraps itself
2) The interpreter adds the script directory to the front of sys.path
3) The code in setup.py starts running, and can choose to either leave
sys.path alone or modify it before it imports anything with the new

But if we use PYTHONPATH to add the source tree to sys.path, then what
happens is:

1) The source tree gets added to the front of sys.path
2) The interpreter bootstraps itself
3) The frontend imports the backend
4) The frontend invokes the backend hook
5) The backend starts running, and can set up whatever environment it
likes before doing anything else

So the difference is that with setup.py, there aren't any imports
between the sys.path manipulation and handing control the script, so
if the script doesn't like their path they can easily change it. And
if there is an import problem, then it's because of a clash between
something that the package author's setup.py did, and something they
put in their package, both of which are under their control. (For
example, it's quite common to see setup.py's and other scripts do a
'sys.path.pop(0)' at the top to avoid any chance of mistakes.) The
sys.path manipulation is a convenience, but it doesn't actually affect
anything by itself.

With PYTHONPATH, the sys.path manipulation happens earlier, so you can
get collisions between the source tree and the Python bootstrap logic.
For example:

~$ mkdir temp
~$ cd temp
~/temp$ touch io.py setup.py
# No problem running a script in this dir
~/temp$ python3 setup.py
# But with PYTHONPATH...
~/temp$ PYTHONPATH=. python3 setup.py
Fatal Python error: Py_Initialize: can't initialize sys standard streams
AttributeError: module 'io' has no attribute 'OpenWrapper'

Current thread 0x00007fd862d83700 (most recent call first):
zsh: abort      PYTHONPATH=. python3 setup.py

I bet if someone filed a bug saying "I ran 'pip install foo' and got
that error mesage" then it would take some time to figure out what
went wrong :-).

Of course, it's unlikely that people will have io.py files sitting
around. But there's a whole list of modules that get imported
automatically during interpreter startup, and which are shadowed by
PYTHONPATH. For example, I just checked Github's bigquery data and
found dozens of repos that have a file called 'site.py' in their root
-- probably because they're a web site or something. But of course if
we use PYTHONPATH then this shadows Python's internal site.py that's
imported for bootstrapping...

We could avoid the interpreter bootstrap problems by adding language
to the spec saying that yes, the source tree goes at the beginning of
sys.path, BUT this MUST NOT be done using PYTHONPATH, and update the
environment inheritance language to specify that this is a special
case that is *not* inherited by child pythons. Kind of annoying when
the only argument for automatically putting the source tree on
sys.path is to simplify the spec, but ok...

And then of course you can also get collisions between the source tree
and the import of the build backend, as discussed earlier in the
thread. Obviously this can't be fixed, since the whole point is to
support cases where the build backend is in the source tree.

If this does happen then... I dunno, maybe the backend shouldn't have
imported that stuff, or maybe the source tree shouldn't have had a
directory called that? But too late, it's out there on PyPI and pip is
getting bug reports and there isn't even clear whose fault it is or
who should fix it. If it were setup.py the obvious solution would be
to add a 'sys.path.pop(0)', but here we've eliminated that option.

I get the impression that a lot of people are like "well, that's a low
probability event, and the rest of the time it's really handy, so
whatever, it's worth it". And I agree it's a low probability event,
but what I'm not sure people are fully realizing is... it's actually
not very handy either :-). The *only* people who benefit from putting
the source tree in sys.path are package authors who want to ship some
custom one-off build backend inside their package. If you're writing a
build backend that ships via PyPI, or you're writing a package that
uses an existing build system pulled from PyPI, then having build
frontends put the source directory on sys.path provides *zero* value
to balance this small-but-non-zero risk.

I think there's a real chance that if we ship this enabled by default
in the name of "simplicity", then not only will we first have to add
the special case alluded to above, but we'll eventually decide that we
need to provide an opt-out flag, and then we'll have to communicate to
projects that almost all of them actually want this opt-out flag,
because the default is tuned for the minority of projects who are are
doing something special. I don't know about you, but I am *so tired*
of trying to explain to folks about "best practices" where the default
is wrong.

OTOH in the alternative approach where this opt-in, then the only
folks who have to worry about this are the ones writing one-off build
backends, and seriously once you've made the life choice to implement
a one-off build backend then adding a line to pyproject.toml is really
the least of your worries. People with special needs should be the
ones paying the special tax.


> On Sun, Jul 30, 2017, at 02:12 PM, Nick Coghlan wrote:
>> On 30 July 2017 at 02:46, Nathaniel Smith <njs at pobox.com> wrote:
>> > Or am I worrying about a non-issue and it's fine if flit imports click from
>> > the source tree?
>> Don't worry about it too much, as the problem here isn't really any
>> worse than it is for normal runtime dependencies of any other project
>> that relies on having the current directory first in sys.path. It just
>> so happens that the project in question in this case is a Python
>> project's build system.
>> Due to the preference for a flat module namespace as the default,
>> there are plenty of ways to hit name shadowing problems in Python, and
>> as Donald notes, build systems have other motives to vendor their
>> dependencies rather than installing them normally.
>> Just switching the path order as has been suggested also doesn't solve
>> the problem, as it merely inverts the issue: having "some_name"
>> installed in site-packages would break source installations of
>> packages that expected to be able to import "some_name" from their own
>> root directory.
>> If the problem does come up in practice, then there are a number of
>> ways for affected projects to work around it in their project
>> directory structure:
>> 1. Use a top-level "src" directory (we may want to reserve "src" on PyPI)
>> 2. Use a top-level "tools" directory (we may want to reserve "tools" on
>> PyPI)
>> 3. Add a leading or trailing underscore to the local directory name
>> (as while that's legal for Python imports, it's prohibited for PyPI
>> project names, and hence will often sidestep naming conflicts with
>> published packages)
>> Beyond that, the only approaches I'm aware of that systematically
>> avoid this kind of problem at the language design level are to either
>> use URL-based imports (ala Java or Go), or else to have separate
>> syntax for "system-only" and "local resolution permitted" imports (ala
>> C and C++), and Guido opted not to pursue either of those strategies
>> for Python.
>> Cheers,
>> Nick.
>> --
>> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

Nathaniel J. Smith -- https://vorpus.org

More information about the Distutils-SIG mailing list