[Distutils] A possible refactor/streamlining of PEP 517

Nick Coghlan ncoghlan at gmail.com
Sun Jul 16 22:44:18 EDT 2017


On 16 July 2017 at 18:24, Nathaniel Smith <njs at pobox.com> wrote:
> On Sat, Jul 15, 2017 at 11:27 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On 16 July 2017 at 14:56, Nathaniel Smith <njs at pobox.com> wrote:
>>> But... that is not what the in-place/out-of-place distinction means in
>>> normal usage, it's not the distinction that any of those build systems
>>> you were surveying implement, and it's not the distinction specified
>>> in the current PEP text.
>>>
>>> If what we want is a distinction between "please give me a correct
>>> wheel" and "please give me a wheel but I don't care if it's broken",
>>> then wouldn't it make more sense to have a simple flag saying *that*?
>>
>> No, because pip *also* wants the ability to request that the backend
>> put the intermediate build artifacts in a particular place,
>
> Say what? Where did they say this? I'm 99% sure this is just not true.

pip currently works by moving the input files out to a different
directory with shutil.copytree.

For PEP 517, their default build strategy is going to be to take an
sdist and unpack it.

Both of those build strategies are particular ways of achieving an
out-of-tree build *without* relying directly on out-of-tree build
support in the backend.

However, if build_sdist fails, they *can't* use their preferred build
strategy, and need a way to ask the backend to do the best it
reasonably can to emulate a build via sdist, even though an sdist
build isn't technically feasible.

Originally, the proposed mechanism for that was the various
incarnations of the "prepare_input_for_build_wheel" hook, which had
the key downside of not aligning well with the way full-fledged build
systems actually work.

By contrast, Daniel & Thomas's suggestion of a build_directory
parameter to build wheel nicely models a front end saying "Well, I
*would* have built an sdist and used that, but it didn't work, so
instead I'm asking you to do the best you can" in a way that *also*
aligns nicely with the way out-of-tree build support in full-fledged
build systems actually works (i.e. by specifying target
directories/build directories/variant directories appropriately)

>> *and*
>> having that ability will likely prove beneficial given directory based
>> caching schemes in build automation pipelines (with BitBucket
>> Pipelines and OpenShift Image Streams being the two I'm personally
>> familiar with, but it's a logical enough approach to speeding up build
>> pipelines that I'm sure there are others).
>
> Yeah, this is a really neat idea! I'm genuinely enthusiastic about it.
> Which... is why I think this is a great target for a future PEP, that
> can adequately address the complications that the current PEP is
> skimming over. As I argued in a previous email, I think these
> pipelines would actually *prefer* that out-of-place build be an
> optional feature, but it's basically impossible to even have the
> discussion about what they want properly as part of the core PEP 517
> discussion.

Even with build_directory defined, backends remain free to ignore it
and put their intermediate artifacts somewhere else. All the API
design does is provide them with the *option* of using the frontend
provided directory.

So that just becomes a quality of implementation issue that folks will
work out iteratively with backend developers.

>> It just turns out that we can piggy back off that in-place/out-of-tree
>> distinction to *also* indicate how much the frontend cares about
>> consistency with sdist builds (which the PEP previously *didn't* say,
>> but explicit text along those lines was added as part of
>> https://github.com/python/peps/pull/310/files based on this latest
>> discussion).
>
> Okay, but then this is... bad. You're taking two unrelated
> distinctions (in-place/out-of-place and sloppy/precise) and smashing
> them together. In particular, one of the types of build that Donald
> has said that he considers "sloppy" and worries about avoiding is any
> kind of incremental build. So if we take Donald's concern and your new
> PEP text literally, then it rules out incremental out-of-place builds.
> But incremental out-of-place builds are exactly what you need for the
> build pipeline case that you're citing as a motivation for this
> feature.
>
> Your PEP is trying to do too many things at once, and that means it's
> going to do them poorly.

Not really, because sloppy/precise isn't actually a distinction we
really care about, it's one that arises from the fact that in-place
builds with setuptools/distutils *are* sloppy about their inputs, and
hence starting with a "dirty" source directory may corrupt the sdists
and wheels produced from those directories. Full-fledged build systems
that work backwards through their dependency graph to their desired
inputs rather than relying on file globs and collecting entire
directory trees from local disk shouldn't be anywhere near as
vulnerable to the problem (although even for those the general
recommendation is that robust CI and release processes should always
start with a clean checkout to ensure that everything is properly
version controlled).

This means that the key benefit that Daniel & Thomas's suggestion of
the build_directory parameter provides is a clear indicator of *where
design responsibility lies* for the behaviour of a given build.

That is, from pip's point of view, the "normal" build path will be
"build_sdist -> unpack sdist -> in-place build_wheel". That's the
sequence that pip itself is directly responsible for.

However, we know that build_sdist may sometimes fail in cases where
build_wheel would have worked, so we *also* define a fallback that
delegates *all* design responsibility for the build to the backend:
out-of-tree build support.

The reason that solves the problem from an ecosystem level perspective
is because it divides responsibility appropriately:

1. backend developers know the most about why their build_sdist
implementation might fail (triggering an out-of-tree build request)
2. backend developers know the most about ways starting with a "dirty"
source directory might corrupt their output
3. backend developers know the most about how to detect if they're
starting with an unpacked sdist or not

The reason we *don't* define the delegation as being to an *in-place*
build is because we *know* we're not happy with that for the
setuptools/distutils status quo - if we were, this part of the design
discussion would never have arisen, and we'd never have had either the
build_directory parameter *or* the various incarnations of the "build
directory preparation" hook.

>>> And in what case would a frontend ever set this
>>> give_me_a_correct_wheel flag to False?
>>
>> When the frontend either genuinely doesn't care (hopefully rare, but
>> not inconceivable), or else when its building from an unpacked sdist
>> and hence can be confident that the artifacts will be consistent with
>> each other regardless of how the backend handles the situation
>> (expected to be very common, since it's the path that will be followed
>> for sdists published to PyPI, and when handed an arbitrary PEP 517
>> source tree to build, pip will likely try "build_sdist -> in-place
>> build_wheel" first and only fall back to "out-of-tree build_wheel" if
>> the initial build_sdist call fails).
>
> Ah, the unpacked sdist case is a good point, I neglected that in my
> discussion of a possible setuptools build_wheel hook. But it's fine --
> even if we say that the setuptools build_wheel hook has to produce an
> "sdist-consistent" wheel in all cases, then it can detect whether it's
> building from an unpacked sdist (e.g. by keying off the presence of
> PKG-INFO), and in that case it knows MANIFEST.in has already been
> taken into account and it can go straight to bdist_wheel.
>
> And in fact, this is what we *want* it to key off of, *not* the
> in-place/out-of-place thing. Consider Debian: they want to do
> out-of-place builds of unpacked sdists. They're working with pristine
> unpacked sdists, so they don't want to setuptools to go pack up a new
> sdist and then unpack it again for every out-of-place build. (In fact,
> this is probably a less-tested path than building directly from the
> unpacked sdist.) So if the point of the out-of-place build feature is
> that it's supposed to tell setuptools that it needs to go via sdist...
> then in this case it will do the wrong thing. Out-of-place and
> sdist-consistency are orthogonal concepts.

That's a frontend design question, not an API design question - while
I agree it would be logical for frontends to have a way for users to
say "this directory is an unpacked sdist, you can trust it and ask the
backend for an in-place build", that's neither here nor there when it
comes to designing the interface between frontends and backends.

>> We don't want to get into the business of micromanaging how backends
>> work (that's how we got into the current mess with distutils &
>> setuptools), we just want to make it possible for frontend developers
>> and backend developers to collaborate effectively over time.
>
> So, uh... you're saying that you don't like this proposal, because it
> has too much micro-management:
>
> - backends are required to produce sdist-consistent wheels

Yes, this qualifies as excessive micromanagement of backends, because
we *know* backends can't reliably satisfy it, and we also know we
don't actually need it (because this isn't a guarantee that
setuptools/distutils provides, and that's the incumbent build system).

> and instead prefer this proposal:
>
> - backends must support one configuration where they place
> intermediate artifacts in one specified place, and are not required to
> produce sdist-consistent wheels
> - backends must also support another configuration where they place
> intermediate artifacts in another place that's specified in a
> different way, and in this case they are required to produce
> sdist-consistent wheels

Producing sdist-consistent wheels is always *desired*.

PEP 517 acknowledges that it isn't always *possible*, and hence places
the following constraints on *front*ends in order to free up design
space for backends:

- if a frontend wants to *ensure* consistency, it has to create the
sdist first, and build from that
- if build_sdist fails, then the frontend should fall back to an
out-of-tree build using the build_directory parameter

For backends:

- for in-place builds, sdist-consistency is 100% the frontend's problem
- for out-of-tree builds, sdist-consistency it the backend's problem,
but mainly for cases where build_sdist would have failed

> I mean... the second proposal encompasses the first proposal, and then
> adds extra requirements, and those extra requirements are about fiddly
> internal things like where intermediate artifacts should be cached...
> surely the first proposal is less micromanage-y?

No, as the first proposal not only overstates our requirements (as
we're technically fine with backends producing inconsistent artifacts
when handed a "dirty" source directory), it mainly looks less exacting
because you haven't defined what you actually mean by
"sdist-consistent wheels".

The current PEP instead breaks out some simpler *mechanical*
recommendations that provide some substance to what "sdist-consistent
wheels" actually means, and whether it's a concern the backend can
assume the frontend has already taken care of.

>> By contrast, `please do an out-of-tree build using this directory`
>> handles the same scenario (build_sdist failed, but build_wheel can
>> still be made to work) in a more elegant fashion by allowing the front
>> end to state *what* it wants (i.e. something that gets as close as is
>> practical to the "build_sdist -> unpack sdist -> in-place build_wheel"
>> path given the limitations of the current environment),
>
> We totally agree the frontend should state "*what* it wants". This is
> exactly my problem with the PEP -- it doesn't do this! Instead, it has
> frontends specify something they don't care about
> (in-place/out-of-place) and then overloads that to mean this unrelated
> thing.

Sure, I can see that concern - it arises from the fact that the way a
frontend tells a backend "I was trying to do 'build sdist -> unpack
sdist -> build wheel', but the 'build sdist' step failed" is by
passing in the directory where it would have unpacked the sdist as the
out-of-tree build directory.

However, the apparent conflation arises from asking the question: "If
we're going to support requesting out-of-tree builds in the backend
API anyway, do we need *another* flag to say 'I wanted to build via
sdist, but that didn't work'".

And I don't think we do - there isn't anything extra that a
well-behaved backend should do for an out-of-tree build *just* because
building an sdist failed, while there *are* things that a backend
should do for an out-of-tree build that it wouldn't do for an in-place
build.

Thus the distinction that actually models a meaningful real world
build management concept is in-place/out-of-tree (as evidenced by the
fact that it maps cleanly to most of the full-fledged build systems
we/I surveyed), and that also turns out to be sufficient to handle our
Python-specific problem of providing a fallback API for frontends to
use when build_sdist fails.

> My draft at the start of this thread was exactly designed by trying to
> find the intersection of (1) satisfying everyone's requirements, (2)
> restricting myself to operations whose semantics I could define in a
> generic, future-proof way.

It's a build system API - (2) is an utterly unattainable goal. The
most we can hope for is an API that makes it relatively
straightforward for end users, publishers, frontend developers and
backend developers to figure out how to resolve cases where an end
user's build fails (environmental issue, project build config issue,
frontend bug, backend bug).

>> That means I'm going to *explicitly* ask that you accept that the PEP
>> is going to be accepted, and it's going to be accepted with the API in
>> its current form, even if you personally don't agree with our
>> reasoning for all of the technical details. If your level of concern
>> around the build_directory parameter specifically is high enough that
>> you don't want to be listed as a co-author on PEP 517 anymore, then
>> that's entirely reasonable (we can add a separate Acknowledgments
>> section to recognise your significant input to the process without
>> implying your endorsement of the final result), but as long as the
>> accepted API ends up being supported in at least pip, flit, and
>> enscons, it honestly doesn't really matter all that much in practice
>> what the rest of us think of the design (we're here as design
>> advisors, rather than being the ones that will necessarily need to
>> cope with the bug reports arising from any interoperability
>> challenges).
>
> Clearly neither of us have convinced each other here. If you want to
> take over authorship of PEP 517, that's fine -- it's basically just
> acknowledging the current reality. But in that case I think you should
> step down as BDFL-delegate; even Guido doesn't accept his own PEPs.

This isn't my API design - it's one that Daniel suggested & Thomas
accepted into the PEP.

However, I like it as BDFL-Delegate, and think it resolves all the
previous concerns, so now I'm attempting to arbitrate a dispute
between the listed PEP authors (where Thomas likes the design and is
willing to implement it for both pip & flit, while you still have
concerns about it and definitely haven't indicated that you're willing
for it to be accepted with your name attached to it).

> Here's another possible way forward: you mold PEP 517 into what you
> think is best, I take my text from the beginning of this thread and
> base a new PEP off it, and then Donald gets stuck as BDFL-delegate and
> has to pick. (Sorry Donald.) What do you think? It would at least have
> the advantage that everyone else gets a chance to catch up and look
> things over -- I think at the moment it's basically only you and me
> who actually have strong opinions, and everyone else is barely
> following.

It isn't my design (I just approve of it and have been providing
updates to help make the PEP self-consistent in the interests of being
able to formally accept it), so I don't think this proposal makes
sense. However, it may make sense for you and Thomas to take the
discussion offline as PEP co-authors, and figure out a resolution that
satisfies you both, as I can't reasonably accept a PEP where one of
the listed co-authors has made it quite clear that they don't want it
to be accepted in its current form.

Whether that's a matter of Thomas becoming sole nominal author (with
your significant contributions being recognised via an
Acknowledgements section), or your deciding that you're willing to
concede the point and reserve the right to tell us all "I told you
so!" in a couple of years' time, or something else entirely, I don't
mind (although I'd hope for a resolution that doesn't involve changing
the proposed API yet again), but I do want to be clear: as
BDFL-Delegate, I am 100% fine with the current technical proposal in
PEP 517, as I think it addresses all of the design requirements that
have been raised, can be readily implemented in both frontends and
backends, and aligns well with the conventions of full-fledged build
management systems.

The clarity of the prose has definitely suffered from the long design
process, but I think the real solution to that is going to be fleshing
out the specifications section in PyPUG as the main source of
reference information, rather than relying on the PEPs themselves for
that (since that particular problem is far from being unique to PEP
517).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list