Re: [Distutils] A possible refactor/streamlining of PEP 517

15 Jul 2017

      On Sat, Jul 15, 2017 at 3:54 AM, Paul Moore <p.f.moore@gmail.com> wrote:
...
On 15 July 2017 at 10:42, Nathaniel Smith <njs@pobox.com> wrote:
...
Hi Paul,
We seem to have some really fundamental miscommunication here;
probably we should figure out what that is instead of continuing to
talk past each other.
Agreed. Thanks for summarising your understanding. Let's see if I can
clarify what I'm saying.
Thanks for your patience!
...
...
As a wild guess... can you define what an "out-of-place build" means to you?
I'm going to do this without reference to your explanation, as trying
to put things in the context of what you say is where I'm getting
messed up. I'll comment on your explanations below.
Again, I'll start with some background. My concern is where we're
trying to deal with a user doing "pip install ." on their development
directory. This is not a core use case for pip, and honestly I don't
feel it's the right workflow (people wanting to work on code and
install it as they go along should be using editable installs, IMO),
but it is something we see people doing, and I don't want to break
that workflow arbitrarily. Given that this is the case we're talking
about, my experience is that working directories contain all sorts of
clutter - small test files I knocked up, experimental changes I
discarded, etc. That may simply reflect the way I work, but comments
I've seen indicate that I'm not *completely* alone. So for me, the
point here is about making sure that "pip install ." in a cluttered
working directory results in "what the developer wants".
Oh yeah, I end up with all kinds of junk in my working directories too.
...
For me the key property I'm looking for is that the developer gets
consistent results for the build commands (i.e., build_wheel and
build_sdist->build a wheel give the same wheel). This is important for
a number of reasons - to avoid publishing errors where the developer
builds a wheel and a sdist and deploys them, to ensure that tox (which
uses sdists) gives the same results as manually running the tests,
etc. In one of your posts you characterised the sorts of discrepancies
I'm trying to avoid as "weird errors" and that's precisely the point -
we get confused users raising issues and I want to avoid that
happening.
Right.
...
So, with that in mind, the distinction between an "in place" and an
"out of place" build is that an in-place build simply trusts that the
developer's directory will deliver consistent results, whereas an
out-of-place build does the build in a separate location that we've
asked the backend to ensure doesn't contain unexpected files. It has
nothing to do with repeated builds (but see below).
...so this is where we diverge. As far as I understand it -- and I'm
pretty sure this matches all the major build systems like automake,
cmake, etc. -- the *only* difference between an in-place and
out-of-place build is where they cache the intermediate artifacts. So
if a build system is, say, scanning the source tree looking for stuff
to build... well, the source tree is the same either way, so if
they're going to pick up random junk in one case, they'll do it in the
other case as well. In fact I think they'd consider it a bug if they
didn't. Or contrariwise, if a build system is smart enough to
recognize that some files are junk and some are not, then it doesn't
matter where it's putting the intermediate files, it'll generate good
results either way.

Or for a more specific example: setuptools has the unfortunate
distinction between sdist mode that uses MANIFEST.in, and bdist mode
that uses setup.py, and skew between these is a frequent source of
problems. But IIUC the only way to exercise the MANIFEST.in path is by
actually generating an sdist. You can do in-place builds (the
default), or out-of-place builds ('build -b some/dir'), but if one is
broken then the other is broken as well.
...
...
For me, the distinction between an in-place and out-of-place build is,
... well, first some background to make sure my terminology is clear:
build systems typically work by taking a source tree as input and
executing a series of rules to generate intermediate artifacts and
eventually the final artifacts. Commonly, as an optimization, they
have some system for caching these intermediate artifacts, so that
future builds can go faster (called "incremental builds"). However,
this optimization is often heuristic-based and therefore introduces a
risk: if the system re-uses a cached artifact that it should have
rebuilt, then this can generate a broken build.
There are two popular strategies for storing this cache, and this is
what "in-place" versus "out-of-place" refers to.
"In-place builds" have a single cache that's stored inside the source
tree -- often, but not always, intermingled with the source files. So
a classic 'make'-based build where you end up with .o files next to
all your .c files is an in-place build, and so is 'python setup.py
build' putting a bunch of .o files inside the build/ directory.
"Out-of-place builds" instead place the cached artifacts into a
designated separate directory. The advantage of this is that you can
potentially work around limitations of the caching strategy by having
multiple caches and switching between them.
[In traditional build systems the build tree concept is also often
intermingled with the idea of a "build configuration", like debug
versus optimized builds and this changes the workflow in various --
but we don't have those and it's a whole extra set of complexity so
let's ignore that.]
For me, all of the above comes under the heading of "incremental
builds", and I'm considering that out of scope. Specifically, pip's
current behaviour offers no (documented) means of choosing between
incremental or clean builds, and users who want that level of control
should be building with the backend tools (setuptools) directly, and
only using pip for the install step once a wheel has been built.
If and when we discuss a UI in pip for requesting incremental or clean
builds, we'd look at the implications on the backend hooks at that
point - but I'm not sure that'll ever be something we want to do, as
it seems like that should probably always be a use case that we'd want
users to be working directly with the backend for (but that's just my
opinion).
Well, so this is part of what's been making me so confused :-). AFAICT
the only reason to care about in-place versus out-of-place builds in
PEP 517 is if we want to provide explicit control over incremental
builds, which seems a weird place to be spending our complexity budget
to me too.
...
...
Corollaries:
- if you're starting with a pristine source tree, then "in-place" and
"out-of-place" builds will produce exactly the same results, because
they're running exactly the same rules. (This is why I'm confused
about why you seem to be claiming that out-of-place builds will help
developers avoid bugs that happen with in-place builds... they're
exactly the same thing!)
Agreed, but I'm concerned about build trees that *aren't* "pristine",
insofar as they are working directories for development. All of your
corollaries depend on the idea that you have a "pristine" build tree,
and that's where our confusion lies, I suspect.
Ah, right -- when I said "pristine" here I was implicitly thinking
"pristine with respect to intermediate build artifacts", since as far
as I know the other kind of pristine is orthogonal to this whole
discussion.
...
...
- if you've done an out-of-place build in a given tree, you can return
to a pristine source tree by deleting the out-of-place directory and
making a new one, without having to deal with the build backend. if
you've done an in-place build in a given tree, then you need something
like a "make clean" rule. But if you have that, then these are
identical, which is why I said that it sounded like pip would be just
as happy with a way to do a clean build. (I'm not saying that the spec
necessarily needs a way to request a clean build -- this is just
trying to understand what the space of options actually is.)
Again, agreed but irrelevant, as "pristine" is not the case that concerns me.
...
- if you're starting with a pristine source tree, and your goal is to
end up with a wheel *while keeping the original tree pristine*, then
some options include: (a) doing a copytree + in-place build on the
copy, like pip does now, (b) making an sdist and then doing an
in-place build
Same again
...
- if you're not starting with a pristine source tree -- like say the
user went and did an in-place build here before invoking pip -- then
you have very few good options. Copytree + in-place build will
hopefully work, but there's a chance you'll pick up detritus from the
previous build. Out-of-tree-builds might or might not work -- I've
given several examples of extant build systems that explicitly
disclaim any reliability in this case. Sdist + build the sdist is
probably the most reliable option here, honestly, when it's an option.
Your idea of a "not pristine" tree differs from mine - having done an
in-place build is the most innocuous example of a non-pristine tree as
far as I'm concerned, and the easiest to deal with (make clean).
* Copytree is certain *not* to work, because it copies all the things
that make the tree not pristine.
* Build sdist and unpack is pip's current planned approach, but Thomas
had issues with guaranteeing that building a sdist was guaranteed
possible. We do *not* want to have cases where pip can't build a wheel
even though build_wheel would have worked, which means build sdist and
unpack is a problem.
Right -- the idea in my simplified proposal is to expose just enough
to give frontends the flexibility to support both of these, plus
fallback from the latter to the former when necessary, so it's at
least no worse than what we have now.
...
* Ask the backend to make a "clean" directory would work (the backend
should know what it needs) - that was the prepare_directory hooks. But
that got too complex.
* Tell the backend we want a build that's isolated from the source
directory and trust it to do the right thing is where we've currently
ended up.
Based on the current discussion, however, I now have concerns that either
a) Backend developers might not understand what build_directory is
requesting, or
b) The PEP doesn't define the semantics of build_directory in a way
that delivers the results I'm suggesting here
Having had this discussion, and re-read the current draft of the PEP,
I do in fact think that (b) is the case. That worries me, because I
don't think it's just me that had made that mistake. Nick has just
posted a message saying
...
Requesting an out-of-tree wheel build is then just a way for a
frontend to say to the backend "Hey, please build the wheel *as if*
you'd exported an sdist and then built that, even if you can't
actually export an sdist right now".
So my understanding is that that's what the build_wheel operation is
-- like, backends should *always* be generating the same wheel as they
would if they'd built an sdist, and if they don't, it's a bug.

Of course bugs do happen, and distutils's fundamental architecture has
some mistakes in it, and it makes sense that pip wants to try and be
robust against them. But that requires some specific strategy for
avoiding specific bugs, and I'm not sure what you have in mind here.
To me having two different ways to run the build just seems like it
gives twice as many chances for bugs (or maybe more once you take
interactions into account).

Similarly we could have a literal boolean flag that is documented to
mean "please build the wheel *as if* you'd exported an sdist and then
built that", but what specifically would you expect backends to do
differently if this was set? Are there any circumstances where you
wouldn't want this to be set?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org