A minor detail for pip strategy option #2 is that sdists do not have to have PKG-INFO.

On Mon, Jul 17, 2017 at 9:02 AM Nathaniel Smith <njs@pobox.com> wrote:
Hi all,

I happened to talk to Donald on IRC last night, and said I'd write up
some notes on that discussion. (He's also seen these notes, but is
partially-offline dealing with jury duty.) [Edit: but apparently still
replying to email on the list, so some of this repeats things he's
posted since I sent it to him to look at, but oh well :-). That's life
in the fast-paced world of F/OSS development I guess.]

One thing we tried to get a handle on is what, exactly, are
constraints that PEP 517 is trying to deal with.

As far as pip goes, these are some of the situations that Donald's
worried about handling gracefully when it comes to 'pip install .':

a) The source tree is on read-only media

b) People do an editable install in a source tree, and then follow it
up with a regular install ('pip install -e . && pip install .').
Apparently this can cause problems because of the residual metadata
from the editable install confusing setuptools.

c) Someone does 'pip install .' on a complex package, then discovers
that while the build succeeded, it's missing some optional feature
because some necessary system library was missing, so they install
that library and re-try the 'pip install .'. Will the build backend
notice that the library has appeared, or does it only do environment
sniffing the first time it's run in a given tree?

d) Mounting the same source tree into multiple different docker
containers: 'docker run ubuntu -v .:/io pip install /io && docker run
alpine -v .:/io install /io'. Will the build backend notice that these
two systems have incompatible C toolchains and C ABIs, so you can't
share .o files between them? In principle a build system could notice
this by detecting that the compiler and system headers have changed,
but in traditional build systems it's common to intentionally *not*
include these in your build cache key, because you don't want to
rebuild the world after every apt upgrade. (For example, gcc provides
the -M switch to generate make rules to trigger rebuilds on header
changes, and the -MM switch to do the same but ignoring changes in
system headers; lots of projects choose to use -MM. This is related to
the thing where incremental builds are traditionally a bit sloppy and
intended for experts only.)

e) Build systems that have different sources of truth for what goes
into an sdist versus what goes into a wheel, and thus can easily end
up in a situation where a direct VCS->wheel build gives a different
result than a VCS->sdist->wheel build. The most prominent offender
here is distutils/setuptools with its MANIFEST.in mess, but Donald is
nervous that other systems might also reproduce this mistake.

f) And finally Donald feels "it's just more hygenic to have ``pip
install .`` not modify ., similiarly to how [he] would be upset if
``pip install foo-1.0.tar.gz`` modified foo-1.0.tar.gz in some way".

Of course, no system can avoid every problem; the overall goal here is
harm reduction and minimizing spurious bugs filed on pip, not
perfection.

None of these cases arise for 'pip install name' or 'pip install
sdist.tar.gz'; it's really only 'pip install .' on an user-provided
source tree.

----------

For reference, here's my analysis of how these particular desiderata
relate to some possible approaches:

- Pip's current system for handling builds from existing source trees
(copytree + setup.py bdist_wheel -or- setup.py install) handles (a),
(c), (d), (f), but not (b) or (e), unless someone has previously done
an in-place build in the tree, in which case it handles (a) and (f)
but not (b), (c), (d), or (e). Which is kind of unfortunate, since (a)
and (f) are probably the least important items.

- When sdist->unpack->wheel is possible, it automatically handles all
of these cases.

- If a build backend *only* does in-place builds (like meson) and
*does not support* in-place or editable installs, then having an
out-of-place build_wheel hook automatically takes care of everything
except (e).

- Otherwise, an out-of-place build_wheel hook acts handles (a), (c),
(d), (f) but not (b) or (e), unless someone has previously done an
in-place build in the tree, in which case it handles (a) and (f) but
not (b), (c), (d), or (e) (assuming that the build system can get
confused between artifacts left behind by in-place builds when doing
an out-of-place build, which appears to be a common feature of
existing systems).

  Notice that this means that as far as this score card is concerned,
out-of-place build_wheel is identical to pip's current copytree +
in-place build_wheel. The key insight here is that an out-of-tree
build is nicer for the *next* person to use this source tree, but when
pip runs what it cares about is whether the *last* build was in-tree
or out-of-tree, and that's not something that it has any control over.

- If you have a system that supports in-place builds but not
build_sdist, then copytree + a "clean build" hook would cover
everything except (e). Notice that, for the same reason as above, it
doesn't matter whether the clean build is in-tree or out-of-tree, just
that it has the ability to avoid any interference from junk left
inside the tree by any previous build.

----------

Anyway. Back to things we discussed.

As far as we understand, the way that flit comes into this is that
flit cannot generate an sdist:
- from a VCS directory when the VCS tools are unavailable
- from an unpacked sdist

(Thomas, is that right?)

So on that basis, we considered a few different strategies that pip
might take for handling 'pip install .':

Pip Strategy Option 1: First make sure that the setuptools backend
build_wheel hook takes responsibility for doing the
sdist->unpack->wheel dance when running from a VCS checkout. And then,
now that we don't have to worry about setuptools messing things up,
make pip's strategy be to just do a plain in-place or out-of-place
build_wheel.

As you can probably guess from the list above, Donald didn't feel
comfortable with this, at least to start with -- he thinks that it
might be possible for pip to transition to this in the future if it
turns out that new backends are generally of high quality, but until
we have more experience, he's nervous. This makes sense to me too -- I
think we can plausibly mandate that build backends take care of (e)
(MANIFEST.in problems -- that's basically the idea of making
setuptools's build backend responsible for doing the
sdist->unpac->wheel dance), and (a) (read-only source tree) is a rare
edge case that can reasonably be handled by a special case in the
frontend, but all the other items actually are regular occurences in
modern build systems.

Really he wants pip to go via sdist 99% of the time, because that's
the case that Just Works, and then have whatever fallback is necessary
for the 1%. In some sense it doesn't even matter that much what the
fallback is, because if this rare edge case is a bit slow or a bit
more error prone, well... it's a rare edge case. Even for flit, 99% of
the time people will just be installing wheels, not downloading and
manually unpacking sdists, so as long as it *works* then it doesn't
have to be perfect. Anyway, this leads to the last two possible
strategies we discussed:

Pip Strategy Option 2: Check to see if a PKG-INFO file is present. If
so, do an in-place build_wheel; otherwise, do
build_sdist->unpack->build_wheel. The key thing here is that because
it's pip checking for the PKG-INFO instead of querying the build
backend, then every source tree has to either *be* an sdist, or be
able *produce* an sdist; there's no fallback for non-sdist trees that
can't produce sdists. (At least if you want to support 'pip install
.')

Pip Strategy Option 3: Ask the build backend whether it can do a
build_sdist. If so, do build_sdist->unpack->build_wheel; otherwise, do
copytree->in-place build_wheel, or out-of-place build_wheel, or
whatever, it doesn't matter that much to pip.

As far as Donald is concerned, either of these options would be fine.
For "option 2" (having pip check for PKG-INFO), the main potential
flaw we noted is that while it's OK with flit not being able to build
an sdist from an sdist, it will error out in the case where you try to
do 'pip install .' on a flit VCS checkout and flit can't find the VCS
tools. Donald is fine with that, on the grounds that oh well, if
you're working with a VCS checkout and need VCS tools to get
consistent result then printing an error message telling people to
install them is a fine outcome. (In particular, he observes that flit
*can* produce sdist-inconsistent results in this case: if there's a
.py file present in the source but not added to the VCS, and you do
build_wheel, then he thinks that the .py file will be installed iff
the VCS tools are unavailable, and would never be included in any
sdist. Thomas, is this right?) However, he thinks that Thomas objected
to raising an error in this case, and wants to allow flit to work.
(Thomas, is that right?) So possibly that's an argument for preferring
"option 3" (ask the backend if it can do an sdist).

Another point about option 2 that we didn't discuss but that I'll
mention now is that it does kinda enforce that all build backends
support building sdists in general, which might be good or bad
depending on what you think about that.

-----------

A few more general points that Donald made:

- He doesn't care much about in-place vs out-of-place in general, we
can provide either or both as far as he's concerned. It doesn't matter
to pip.

- But, if we do support both in-place and out-of-place, then he
strongly feels that there should *not* be any semantic difference
between them "because it just adds another "path" and possible
inconsistency", plus if a build backend is able to do something smart
it should just do it all the time. So I think this is a point of
disagreement between Donald and Nick.

- If pip is going to go with "option 3" (i.e., attempting to do
build_sdist->unpack->build_wheel, unless the build backend says it
can't), then he strongly feels that this fallback should be triggered
by some *explicit* signal from the backend saying that build_sdist is
not supported; in general he wants build_sdist raising an error to be
an immediate failure, rather than triggering the fallback path.
Basically what he's saying is that he wants something like my
NotImplemented proposal or equivalent.

Regarding this last point: I want to point out this would also mean
that when building via pip, the fallback could *only* be triggered by
backends that explicitly ask for it, so they're the only ones who
could possibly end up exposing their users to any weirdness in the
fallback path, and it's up to them to figure out how to deal with it.
In particular, it means that setuptools and flit are both handled fine
regardless of what we decide about in-place vs. out-of-place, because
setuptools will always go via the sdist path and thus be built from a
pristine directory, and flit doesn't care about in-place vs.
out-of-place anyway. Basically, as long as pip has some kind of
build_sdist and build_wheel and build_sdist can signal "not
implemented" in a clean way, then pip is happy.

My analysis above suggests that beyond this, there's *one* possible
extension that pip might want: In the future, it might turn out that
there is a build backend that is like flit in that it can't always
generate an sdist, but is like setuptools in that it can produce
broken results when run in a unclean directory. If that happens, and
if we decide that it's important for such a backend to play nicely
with pip *and* to support incremental builds *and* to do both of these
through the PEP 517 hook interface, then it might be useful to have
something like a "make_clean_then_build_wheel" hook for pip to call.
My feeling is we can probably defer this for now though? This is like
3 edge cases on top of each other and we don't even know if such a
backend will ever exist.

----------

So none of the above actually has much to do with the in-place vs
out-of-place debate; basically the conclusion is that pip doesn't
care. But we still have to pick something.

Donald and I both have some preference for only having one way to do
it; given that the semantics are supposed to be identical, why bother
making everyone implement both? So: should we start with in-place (and
possibly add out-of-place as an option later for specific use cases
like build pipelines), or should we start by mandating out-of-place?

Neither of us have a particularly strong opinion on this. I guess I
have mild preference for starting with in-place, because:

- Out-of-place is more complicated to implement than in-place, and
it's going to be difficult to explain to build system authors why
we're forcing them to do this extra work for some obscure cases they
might not care about and that we haven't articulated well.

- It might seem like only supporting out-of-place builds would
simplify pip's life, because the problems above that are caused by
in-place build detritus would go away. But this isn't really true,
because even if we don't *expose* in-place build functionality through
the PEP 517 hooks, then most build systems are still going to
implement this, and that means we still need to be prepared to handle
the case where the user has done an in-place build, which is the
tricky one. Plus, editable installs intrinsically leave detritus in
the source tree.

- For me, one of the major goals of PEP 517 is to make life easier for
*really* gnarly projects -- and here I'm not thinking of like numpy or
scipy, they have it easy; I'm thinking of the folks who have, like, a
dozen C different libraries with build systems from the 80s vendored
inside their source tree. I'm nervous that forcing these folks with
complicated embedded multi-package builds to support out-of-tree
builds would be a significant burden and present an obstacle to
uptake. Out-of-place builds are definitely a more advanced feature and
the direction good build systems move in, but I don't want to leave
anyone behind.

- We have zero users for this functionality right now, which is
usually not a good situation for trying to write a spec.

- I don't think it will be a big deal to add out-of-place support
later if we want to.

But really it's all that other stuff that's important to sort out first.


--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig