[Distutils] A possible refactor/streamlining of PEP 517

Nathaniel Smith njs at pobox.com
Sat Jul 15 11:07:52 EDT 2017


On Sat, Jul 15, 2017 at 3:54 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 15 July 2017 at 10:42, Nathaniel Smith <njs at pobox.com> wrote:
>> Hi Paul,
>>
>> We seem to have some really fundamental miscommunication here;
>> probably we should figure out what that is instead of continuing to
>> talk past each other.
>
> Agreed. Thanks for summarising your understanding. Let's see if I can
> clarify what I'm saying.

Thanks for your patience!

>> As a wild guess... can you define what an "out-of-place build" means to you?
>
> I'm going to do this without reference to your explanation, as trying
> to put things in the context of what you say is where I'm getting
> messed up. I'll comment on your explanations below.
>
> Again, I'll start with some background. My concern is where we're
> trying to deal with a user doing "pip install ." on their development
> directory. This is not a core use case for pip, and honestly I don't
> feel it's the right workflow (people wanting to work on code and
> install it as they go along should be using editable installs, IMO),
> but it is something we see people doing, and I don't want to break
> that workflow arbitrarily. Given that this is the case we're talking
> about, my experience is that working directories contain all sorts of
> clutter - small test files I knocked up, experimental changes I
> discarded, etc. That may simply reflect the way I work, but comments
> I've seen indicate that I'm not *completely* alone. So for me, the
> point here is about making sure that "pip install ." in a cluttered
> working directory results in "what the developer wants".

Oh yeah, I end up with all kinds of junk in my working directories too.

> For me the key property I'm looking for is that the developer gets
> consistent results for the build commands (i.e., build_wheel and
> build_sdist->build a wheel give the same wheel). This is important for
> a number of reasons - to avoid publishing errors where the developer
> builds a wheel and a sdist and deploys them, to ensure that tox (which
> uses sdists) gives the same results as manually running the tests,
> etc. In one of your posts you characterised the sorts of discrepancies
> I'm trying to avoid as "weird errors" and that's precisely the point -
> we get confused users raising issues and I want to avoid that
> happening.

Right.

> So, with that in mind, the distinction between an "in place" and an
> "out of place" build is that an in-place build simply trusts that the
> developer's directory will deliver consistent results, whereas an
> out-of-place build does the build in a separate location that we've
> asked the backend to ensure doesn't contain unexpected files. It has
> nothing to do with repeated builds (but see below).

...so this is where we diverge. As far as I understand it -- and I'm
pretty sure this matches all the major build systems like automake,
cmake, etc. -- the *only* difference between an in-place and
out-of-place build is where they cache the intermediate artifacts. So
if a build system is, say, scanning the source tree looking for stuff
to build... well, the source tree is the same either way, so if
they're going to pick up random junk in one case, they'll do it in the
other case as well. In fact I think they'd consider it a bug if they
didn't. Or contrariwise, if a build system is smart enough to
recognize that some files are junk and some are not, then it doesn't
matter where it's putting the intermediate files, it'll generate good
results either way.

Or for a more specific example: setuptools has the unfortunate
distinction between sdist mode that uses MANIFEST.in, and bdist mode
that uses setup.py, and skew between these is a frequent source of
problems. But IIUC the only way to exercise the MANIFEST.in path is by
actually generating an sdist. You can do in-place builds (the
default), or out-of-place builds ('build -b some/dir'), but if one is
broken then the other is broken as well.

>> For me, the distinction between an in-place and out-of-place build is,
>> ... well, first some background to make sure my terminology is clear:
>> build systems typically work by taking a source tree as input and
>> executing a series of rules to generate intermediate artifacts and
>> eventually the final artifacts. Commonly, as an optimization, they
>> have some system for caching these intermediate artifacts, so that
>> future builds can go faster (called "incremental builds"). However,
>> this optimization is often heuristic-based and therefore introduces a
>> risk: if the system re-uses a cached artifact that it should have
>> rebuilt, then this can generate a broken build.
>>
>> There are two popular strategies for storing this cache, and this is
>> what "in-place" versus "out-of-place" refers to.
>>
>> "In-place builds" have a single cache that's stored inside the source
>> tree -- often, but not always, intermingled with the source files. So
>> a classic 'make'-based build where you end up with .o files next to
>> all your .c files is an in-place build, and so is 'python setup.py
>> build' putting a bunch of .o files inside the build/ directory.
>>
>> "Out-of-place builds" instead place the cached artifacts into a
>> designated separate directory. The advantage of this is that you can
>> potentially work around limitations of the caching strategy by having
>> multiple caches and switching between them.
>>
>> [In traditional build systems the build tree concept is also often
>> intermingled with the idea of a "build configuration", like debug
>> versus optimized builds and this changes the workflow in various --
>> but we don't have those and it's a whole extra set of complexity so
>> let's ignore that.]
>
> For me, all of the above comes under the heading of "incremental
> builds", and I'm considering that out of scope. Specifically, pip's
> current behaviour offers no (documented) means of choosing between
> incremental or clean builds, and users who want that level of control
> should be building with the backend tools (setuptools) directly, and
> only using pip for the install step once a wheel has been built.
>
> If and when we discuss a UI in pip for requesting incremental or clean
> builds, we'd look at the implications on the backend hooks at that
> point - but I'm not sure that'll ever be something we want to do, as
> it seems like that should probably always be a use case that we'd want
> users to be working directly with the backend for (but that's just my
> opinion).

Well, so this is part of what's been making me so confused :-). AFAICT
the only reason to care about in-place versus out-of-place builds in
PEP 517 is if we want to provide explicit control over incremental
builds, which seems a weird place to be spending our complexity budget
to me too.

>> Corollaries:
>>
>> - if you're starting with a pristine source tree, then "in-place" and
>> "out-of-place" builds will produce exactly the same results, because
>> they're running exactly the same rules. (This is why I'm confused
>> about why you seem to be claiming that out-of-place builds will help
>> developers avoid bugs that happen with in-place builds... they're
>> exactly the same thing!)
>
> Agreed, but I'm concerned about build trees that *aren't* "pristine",
> insofar as they are working directories for development. All of your
> corollaries depend on the idea that you have a "pristine" build tree,
> and that's where our confusion lies, I suspect.

Ah, right -- when I said "pristine" here I was implicitly thinking
"pristine with respect to intermediate build artifacts", since as far
as I know the other kind of pristine is orthogonal to this whole
discussion.

>> - if you've done an out-of-place build in a given tree, you can return
>> to a pristine source tree by deleting the out-of-place directory and
>> making a new one, without having to deal with the build backend. if
>> you've done an in-place build in a given tree, then you need something
>> like a "make clean" rule. But if you have that, then these are
>> identical, which is why I said that it sounded like pip would be just
>> as happy with a way to do a clean build. (I'm not saying that the spec
>> necessarily needs a way to request a clean build -- this is just
>> trying to understand what the space of options actually is.)
>
> Again, agreed but irrelevant, as "pristine" is not the case that concerns me.
>
>> - if you're starting with a pristine source tree, and your goal is to
>> end up with a wheel *while keeping the original tree pristine*, then
>> some options include: (a) doing a copytree + in-place build on the
>> copy, like pip does now, (b) making an sdist and then doing an
>> in-place build
>
> Same again
>
>> - if you're not starting with a pristine source tree -- like say the
>> user went and did an in-place build here before invoking pip -- then
>> you have very few good options. Copytree + in-place build will
>> hopefully work, but there's a chance you'll pick up detritus from the
>> previous build. Out-of-tree-builds might or might not work -- I've
>> given several examples of extant build systems that explicitly
>> disclaim any reliability in this case. Sdist + build the sdist is
>> probably the most reliable option here, honestly, when it's an option.
>
> Your idea of a "not pristine" tree differs from mine - having done an
> in-place build is the most innocuous example of a non-pristine tree as
> far as I'm concerned, and the easiest to deal with (make clean).
>
> * Copytree is certain *not* to work, because it copies all the things
> that make the tree not pristine.
> * Build sdist and unpack is pip's current planned approach, but Thomas
> had issues with guaranteeing that building a sdist was guaranteed
> possible. We do *not* want to have cases where pip can't build a wheel
> even though build_wheel would have worked, which means build sdist and
> unpack is a problem.

Right -- the idea in my simplified proposal is to expose just enough
to give frontends the flexibility to support both of these, plus
fallback from the latter to the former when necessary, so it's at
least no worse than what we have now.

> * Ask the backend to make a "clean" directory would work (the backend
> should know what it needs) - that was the prepare_directory hooks. But
> that got too complex.
> * Tell the backend we want a build that's isolated from the source
> directory and trust it to do the right thing is where we've currently
> ended up.
>
> Based on the current discussion, however, I now have concerns that either
>
> a) Backend developers might not understand what build_directory is
> requesting, or
> b) The PEP doesn't define the semantics of build_directory in a way
> that delivers the results I'm suggesting here
>
> Having had this discussion, and re-read the current draft of the PEP,
> I do in fact think that (b) is the case. That worries me, because I
> don't think it's just me that had made that mistake. Nick has just
> posted a message saying
>
>> Requesting an out-of-tree wheel build is then just a way for a
>> frontend to say to the backend "Hey, please build the wheel *as if*
>> you'd exported an sdist and then built that, even if you can't
>> actually export an sdist right now".

So my understanding is that that's what the build_wheel operation is
-- like, backends should *always* be generating the same wheel as they
would if they'd built an sdist, and if they don't, it's a bug.

Of course bugs do happen, and distutils's fundamental architecture has
some mistakes in it, and it makes sense that pip wants to try and be
robust against them. But that requires some specific strategy for
avoiding specific bugs, and I'm not sure what you have in mind here.
To me having two different ways to run the build just seems like it
gives twice as many chances for bugs (or maybe more once you take
interactions into account).

Similarly we could have a literal boolean flag that is documented to
mean "please build the wheel *as if* you'd exported an sdist and then
built that", but what specifically would you expect backends to do
differently if this was set? Are there any circumstances where you
wouldn't want this to be set?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the Distutils-SIG mailing list