Re: [Distutils] A possible refactor/streamlining of PEP 517

July 15, 2017

      Hi Paul,

We seem to have some really fundamental miscommunication here;
probably we should figure out what that is instead of continuing to
talk past each other. As a wild guess... can you define what an
"out-of-place build" means to you?

For me, the distinction between an in-place and out-of-place build is,
... well, first some background to make sure my terminology is clear:
build systems typically work by taking a source tree as input and
executing a series of rules to generate intermediate artifacts and
eventually the final artifacts. Commonly, as an optimization, they
have some system for caching these intermediate artifacts, so that
future builds can go faster (called "incremental builds"). However,
this optimization is often heuristic-based and therefore introduces a
risk: if the system re-uses a cached artifact that it should have
rebuilt, then this can generate a broken build.

There are two popular strategies for storing this cache, and this is
what "in-place" versus "out-of-place" refers to.

"In-place builds" have a single cache that's stored inside the source
tree -- often, but not always, intermingled with the source files. So
a classic 'make'-based build where you end up with .o files next to
all your .c files is an in-place build, and so is 'python setup.py
build' putting a bunch of .o files inside the build/ directory.

"Out-of-place builds" instead place the cached artifacts into a
designated separate directory. The advantage of this is that you can
potentially work around limitations of the caching strategy by having
multiple caches and switching between them.

[In traditional build systems the build tree concept is also often
intermingled with the idea of a "build configuration", like debug
versus optimized builds and this changes the workflow in various --
but we don't have those and it's a whole extra set of complexity so
let's ignore that.]

Corollaries:

- if you're starting with a pristine source tree, then "in-place" and
"out-of-place" builds will produce exactly the same results, because
they're running exactly the same rules. (This is why I'm confused
about why you seem to be claiming that out-of-place builds will help
developers avoid bugs that happen with in-place builds... they're
exactly the same thing!)

- if you've done an out-of-place build in a given tree, you can return
to a pristine source tree by deleting the out-of-place directory and
making a new one, without having to deal with the build backend. if
you've done an in-place build in a given tree, then you need something
like a "make clean" rule. But if you have that, then these are
identical, which is why I said that it sounded like pip would be just
as happy with a way to do a clean build. (I'm not saying that the spec
necessarily needs a way to request a clean build -- this is just
trying to understand what the space of options actually is.)

- if you're starting with a pristine source tree, and your goal is to
end up with a wheel *while keeping the original tree pristine*, then
some options include: (a) doing a copytree + in-place build on the
copy, like pip does now, (b) making an sdist and then doing an
in-place build

- if you're not starting with a pristine source tree -- like say the
user went and did an in-place build here before invoking pip -- then
you have very few good options. Copytree + in-place build will
hopefully work, but there's a chance you'll pick up detritus from the
previous build. Out-of-tree-builds might or might not work -- I've
given several examples of extant build systems that explicitly
disclaim any reliability in this case. Sdist + build the sdist is
probably the most reliable option here, honestly, when it's an option.

Does that make sense? Does it... help explain any of the ways we're
talking past each other?

-n

On Fri, Jul 14, 2017 at 1:58 PM, Paul Moore <p.f.moore@gmail.com> wrote:
...
On 14 July 2017 at 19:24, Nathaniel Smith <njs@pobox.com> wrote:
...
...
...
- pip doesn't trust build systems to properly support incremental
builds, so it wants to force it to throw away the build artifacts
after every build
It's less that, and more pip wanting to ensure that the default
publisher experience is reasonable close to the default end user
experience, in order to increase the likelihood that publishers ship
working sdists and wheel files, even if they haven't learned about the
full suite of available pre-release testing tools yet (or have chosen
not to use them).
This is exactly the opposite of what Paul says in his message posted
at about the same time as yours... AFAICT his argument is that build
artifact leakage is exactly what pip is worried about, and if anything
he's annoyed at me because he thinks I'm downplaying the problem :-).
(My impression is that Donald's position might be closer to what you
wrote, though.)
No, you've misunderstood my point completely. What I was trying to say
is *exactly* the same as what Nick said - we're just expressing it in
different terms.
I'm not sure how we can resolve this - from my perspective, everyone
else[1] is in agreement with the latest revision of the PEP. You have
some concerns, but I'm failing to understand what they are, except
that they seem to be based on a confusion about what "the rest of us"
want (I'm not trying to frame this as an "us against you" argument,
just trying to point out that you seem to be seeing conflicts in our
position that the rest of us don't). I'd really like to understand
your specific concerns, but we seem to be spending too much time
talking at cross purposes at the moment, and not making much progress.
[1] Donald's still thinking about it, but I'm fairly sure the proposal
aligns with what he's after, at least in broad terms.
So, maybe I can address some points I *think* you're making, and we
can try to understand each other that way.
1. Incremental builds. My understanding is that the current proposal
says *nothing* about incremental builds. There's nothing stopping
backends supporting them. On the other hand, pip at the moment doesn't
have a way for the user to request an incremental build, or request a
complete rebuild. As far as I know, there's nothing in pip's
documented behaviour that even suggests whether builds are incremental
(that's not to say we'll break what support there is arbitrarily, just
that it's not formalised). Solving the question of incremental builds
was, as far as I recall, agreed by everyone as something we'd defer to
a future PEP.
2. The need for out of place builds. Pip needs them - to solve a
certain class of issue that we've seen in real bug reports. You don't
seem convinced that we do need them, but I'm not sure what else I can
say here. You also seem concerned that if we get out of place builds,
that means that pip will use them to break functionality that you need
(specifically, incremental builds, as far as I can see). All I can say
to that is that you seem very unwilling to trust the pip developers to
take care to support our users' use cases. We're not going to
arbitrarily break users' workflows - and if you don't believe that I
can't convince you otherwise.
3. You are concerned that having to support in-place and out-of-place
builds is a significant burden for backends. Thomas has said it's OK
for flit (it's not a good fit, but he can support it). Daniel has
confirmed enscons can handle it. Nick has surveyed a number of other
build systems (that don't currently have backends, but could) and sees
no major problems. Can you give any concrete example of a case where a
realistic backend would find the requirement a burden? (Please don't
claim incremental builds as the reason - the PEP explicitly says that
backends can cache data from previous builds in the build_dir, which
allows incremental builds if the backend wants them).
The key point about *all* of the proposals we've had (build via sdist,
have a "prepare a build directory" hook, and the build directory
parameter) is to ensure that the wheels that are built when the
frontend requests them are the same as those that would be obtained if
the user were to build a sdist and then install from that. That's not
a case of "not trusting the backend", or "protecting against a bug in
the frontend or backend" - it's about making sure the *developer*
didn't make a mistake when modifying his project. That class of
mistake *does* happen - we've seen it reported by our end users.
To address some of your specific comments (apologies if I
misunderstand anything, it's just because I don't follow your basic
position at all, so things that seem obvious to you are confusing me
badly).
...
You're trying to shut me down by making these pronouncements about
what PEP 517 is going to be, and yet you don't seem to even understand
what requirements people are talking about. I'm sorry if that's rude,
and I'm not angry at you personally or anything, but I am frustrated
and I don't understand why I seem to have to keep throwing myself in
front of these ideas that seem so poorly motivated.
From my POV, you keep flagging incremental builds as the big issue.
It's not a matter of not understanding your requirements, it's more
that we all (and I *thought* that included you) agreed that
incremental builds were out of scope for this PEP.
If you want to expand the scope of the PEP to include incremental
builds you'll need to make that clear - but I think most people are
too burned out to be sympathetic to that.
...
...
And that's been the main discovery of the last round of discussions -
we'd been talking past each other, and what we actually wanted was for
the backend interface to accurately model the in-place/out-of-tree
discussion that has been common to most build systems at least since
autotools (hence my catalog of them earlier in the thread).
This statement seems just... completely wrong to me. I can't see any
way that the in-place/out-of-place distinction matches anything we
were talking about earlier. Have we been reading the same threads?
There was the whole thing about pip copying trees, of course, but
that's a completely different thing. The motivation there is that they
want to make sure that two builds don't accidentally influence each
other.
No it's not. It's to make sure that *things that are present in the
source directory but aren't specified as being part of the project*
(so they won't appear in the sdist) don't influence the build.
...
If they don't trust the build system, then this requires
copying to a new tree (either via copytree or sdist). If they do trust
the build system, then there are other options -- you can do an
in-place build, or have a flag saying "please 'make clean' before this
build", or something. So out-of-place builds are either insufficient
or unnecessary for what pip wants. Copying the source tree and
out-of-place builds are completely different things with different use
cases; they happen to both involve external directories, but that's
mostly a false similarity I think.
This makes no sense to me, because you seem to have misunderstood the
motivation behind out of place builds. (It's not about trusting the
build system or wanting to do a clean non-incremental build).
Agreed that copying the whole source tree and doing an out of place
build are different. I don't think there's anyone suggesting "copy the
whole source tree" satisfies the requirements we have?
...
And as for flit, it doesn't even have a distinction between in-place
and out-of-place builds...
And yet Thomas has stated that he's OK with implementing this
requirement of the PEP for flit - so you can't use flit as an example
for your arguments.
...
Forcing all build systems to support both in-place builds and
out-of-place builds adds substantial additional complexity, introduces
multiple new failure modes, and AFAICT is not very well suited to the
problems people have been worrying about.
Please provide a concrete example of a build backend for which this is
true. We can't make any progress based on speculative assumptions
about what backends will find hard.
...
...
1. What if backends get their out-of-tree support wrong?
2. What happens if re-using build directories isn't tested properly?
3. What happens if an upgrade breaks incremental builds?
4. Do we enforce not modifying the source directory?
...
These first 4 answers might be reasonable if it wasn't for the fact
that the *entire alleged motivation* for this feature is to reduce the
chance of accidental breakage. You can't justify it like that and then
hand-wave away all the new potential sources of accidental breakage
that it introduces...
As I've said, you seem to have misunderstood what the motivation for
the feature. It's preserving the equivalence of build_wheel and
building via sdist (in the face of possible developer errors in what
they have stored in the source directory), not about protecting
against implementation bugs in pip or backends, nor is it in any way
related to incremental builds.
...
Or, well, that sounds pretty unusable, so maybe instead we would want
to go with the alternative: "Backends MUST support free switching
between in-place and out-of-place builds in the same directory" -- but
that's a whole 'nother likely source of weird problems, and breaks the
idea that this is something that existing build systems all support
already.
All of the "weird problems" you're suggesting might exist seem to me
to be cases where building a wheel directly would produce a different
result from producing a sdist and then producing a wheel from that
sdist. (Because that's essentially what switching from in-place to
out-of-place *means*).
Are you explicitly requiring that build systems should be allowed to
do that? That it's OK for a build system to produce a sdist and a
wheel that *won't work the same* when used for installations? I'm
unable to believe that's what you're suggesting.
...
...
7. Why include in-place support in the API then?
Because some folks (including you) would like to include incremental
build support, and because if we don't support it explicitly backends
will still have to deal with the "wheel_directory == build_directory"
case.
Out-of-tree builds as currently specified are required to handle
incremental out-of-tree builds... maybe that's a mistake, but that's
what the text says.
Wait. Are you now proposing that the current out-of-place build
support is OK for you and you'd be happy to drop support for
*in-place* builds???
...
I don't understand the second half of your sentence.
...
9. Why not wait and then add a new backend capability requirement later?
Waiting to add the requirement won't provide us with any more data
than we already have, but may give backend implementors the impression
they don't need to care about out-of-tree build support. This is also
our first, last, and only chance to make out-of-tree build support a
*mandatory* backend requirement that frontends can rely on - if we add
it later, it will necessarily only be optional.
AFAICT making it optional is better for everyone, so not sure why
that's seen as a bad thing.
Analysis:
- If you're an eager backend dev who loves this stuff, you'll
implement it anyway, so it doesn't matter whether it's optional
- If you're a just-trying-to-hack-something-together backend dev maybe
constrained by some ugly legacy build system, then you'd much rather
it be optional because then you don't have to deal with hacking up
some fake "out-of-tree build" using copytree, and maybe getting it
wrong
- If you're pip, then AFAICT you're more worried about lazy backend
devs screwing things up than anything else, in which case you don't
want them hacking up their own fake "out-of-tree build" using
copytree, you want to take responsibility for that so you know that
you get it right
If you're pip, you *can't implement this* because you need the backend
to tell you what files are needed. Full tree copies are a significant
performance problem, and don't actually deliver what we need anyway.
And we can't handle it via "create a sdist" because we can't guarantee
that the build_sdist hook won't fail in cases where build_wheel would
have worked.
The fact that pip can't do this without backend assistance is
*precisely* the reason we need one of the solutions we've debated
here. And the build_directory parameter (out of place builds) is the
solution that the backend developers (Thomas and Daniel) have accepted
as the best option.
I hope I haven't misunderstood any of your points too badly. But if I
have, just ignore my comments and focus on the initial part of my
email. That's the key point anyway.
Paul
-- 
Nathaniel J. Smith -- https://vorpus.org