Re: [Distutils] Update to my skeletal PEP for a new build system interface

Nov. 9, 2015

      On 9 November 2015 at 17:21, Nathaniel Smith <njs@pobox.com> wrote:
...
On Mon, Nov 9, 2015 at 7:34 AM, Paul Moore <p.f.moore@gmail.com> wrote:
...
On 9 November 2015 at 05:20, Nathaniel Smith <njs@pobox.com> wrote:
...
A *source tree* is something like a VCS checkout. We need a standard
interface for installing from this format, to support usages like
``pip install some-directory/``.
I still find these two definitions unhelpful, sorry.
We don't *need* an interface to install from a source tree. It's
entirely feasible to have a standard interface to build a sdist from a
source tree and go source tree -> sdist -> wheel -> install. That
doesn't cater for editable installs, nor does it cater for reusing
things like object files from previous builds, so there may be
*benefits* to having a richer interface than this, but it's wrong to
say it's needed.
I am confuse. All that sentence is saying is that (a) it is useful to
have the phrase "source tree" as distinct from "sdist" so we can talk
about them (which I assume you agree about because you use that phrase
in your response :-)),
Agreed.
...
and (b) there must be *some* interface that
allows people to type "pip install some-directory/" and have it work
because that's a feature we have to support (which I assume you agree
about because you immediately propose an interface for supporting that
feature).
Are we talking at cross purposes here? The end user interface "pip
install directory" is OK. What I think this PEP is saying is that we
need a way for pip to *implement* that functionality in terms of
primitive operations that the "source tree" must support. That, again,
I'm fine with. But you're then saying (I think) that the primitive
operation a source tree must provide is an "install" operation - and
that's what I fundamentally disagree with. The source tree should
provide a "build" primitive. If we agree on that (which I think we do,
but I don't think the PEP says so), then there's still a further
point, on which I think we do disagree, and that's over sdists.

I think that there are *two* steps within the build process, and these
need to be separated out:

1. Make a structured archive of the project's sources. This includes
creation of all generated source files that can be created in a
target-independent way. This would include (static) metadata,
generated source files such as cython output, etc. The point about
this archive is that it is fully target-independent, and does not
require any tools to build it that are not fundamentally
target-dependent. This is what I consider to be the "sdist". There
should only ever need to be one sdist for a given name/version of a
project, precisely because it's totally portable, by design.

2. Create target-dependent installable wheels. This is the "build"
step, in the sense that it's when you run a compiler to create
platform-specific binaries.

With this model, the install process is specifically

source tree ---> sdist ---> wheel ---> installed package

It is possible that tools could merge some of these steps, but a
generic tool like pip that manages the running of the steps in an
appropriate order needs to work in terms of the fundamental building
blocks. So I am strongly opposed to proposals that treat source tree
---> wheel as a primitive operation, because they hamper pip's ability
to manage things at the level of the fundamental steps.

One of the worst aspects of distutils, and one that pip is still far
from free of, is the fact that distutils provides merged steps like
source tree ---> installed package, and we (mistakenly, in hindsight)
used them to "optimise" the way pip works. It did optimise things in
some ways, I guess, but it makes it really hard to disentangle things
when we want to modularise processing.

The above is of course idealised. Editable installs are one example of
something that simply doesn't follow this pattern, and as far as I can
see they make no sense *except* as a source tree --> editable install
one-step operation. Also, modularising the steps to this extent does
have downsides - separating source tree --> sdist and sdist --> wheel
makes it harder to do "in place rebuild" optimisations. We can agree
or disagree on the trade-offs, or we can work on trying to get the
best of both worlds, but I still think we should be starting
(certainly when working at the spec/PEP level) from a clean conceptual
model.
...
It sounds like we do disagree about the details of what this interface
should look like and thus how "pip install some-directory/" should
work internally, but that's not a problem with the definition (or
indeed something that this PEP's text currently takes any stance on at
all :-)).
As I say, I think we're talking at cross purposes. I read the PEP as
trying to specify (the wrong) primitives for pip to use. I'm not sure
what you intend the PEP to say - maybe that "pip install <directory>"
is the canonical install command? I don't think that needs a PEP, it's
just how pip works (and other tools may choose to expose things in a
different manner).
...
...
I suspect you're reluctant to require a "source tree -> sdist"
interface, because the author of flit isn't comfortable with having
such a thing. That's OK - if you want to note that a benefit of going
direct to install (or wheel) is that tools that don't allow you to
create a sdist are supported, then let's make that explicit. Expect
plenty of pushback on the idea of tools that don't supply sdists
though...
I actually haven't talked to Thomas about this particular point at
all, and actually part of what started all this was my looking at flit
and going "this is cool, but c'mon, you can't just throw away sdists"
:-).
The reason I'm reluctant to require a "source tree -> sdist" interface
is described here:
    https://mail.python.org/pipermail/distutils-sig/2015-November/027636.html
and also at the very top of this long email (which for some reason I
can't seem to find in the mail.python.org archives?):
    https://www.mail-archive.com/distutils-sig@python.org/msg23144.html
The TL;DR is: obviously we need source tree -> sdist operations
somewhere, and obviously we need mechanisms to increase the
reliability of builds -- we all agree that there's some irreducible
complexity there, those issues need to be addressed, the question is
just where to put that complexity. I think putting it into the PEP for
the build frontend <-> build backend interface is the wrong place,
because it increases spec complexity (the worst kind of complexity)
and it rules out the useful feature of incremental rebuilds. (And by
"useful feature" there I mean "if we regress from distutils by failing
to support this, then there's a good chance downstream devs will
simply refuse to use our new design".)
But here I think we have a new term that's adding confusion. Pip isn't
a "build frontend". In 99% of cases pip does no building at all.

Basically, pip is a manager of build and install steps, and to manage
those steps successfully, it needs clear definitions of the steps
involved. In the extreme case, if there's a step "take a source tree
and install it" you've left nothing for pip to manage, and you may as
well go back to setup.py install.

I think that extracting and formalising the fundamental ("atomic" if
you like) steps that constitute going from a source tree to an
installed package, is precisely the sort of simplification a spec/PEP
*must* do. In doing so, there are engineering trade-offs such as how
we reintroduce incremental rebuilds without compromising the model.
Such trade-offs may imply a need to add complexity to the spec (maybe
in terms of optional "combined" steps such as source tree --> wheel),
but it should be clear that these are (a) optional (as in, the process
works fine with just the atomic steps) and (b) optimisations (as in,
they can't alter the ultimate behaviour as defined in terms of atomic
steps).
...
...
Certainly your definition of a sdist is general enough that it doesn't
preclude such things. But on the other hand, it doesn't offer any
suggestion that this is an important feature of a sdist (and it is - I
say that as someone who has needed to build wheels from a sdist and
doesn't have Cython installed). From your definition, people will
infer that zipping up a development directory makes a sdist, and so
that's what they'll do. Because after all, making Cython a build
requirement and generating the C at build time is *also* an option,
it's just not as friendly to the average user.
Hmm, I certainly agree that it doesn't preclude such things, because I
am very aware of this use case (I maintain projects that handle Cython
in exactly the way you describe), and it never occurred to me that
this could *not* be supported :-). I'm not sure what you're worried
about exactly? Right now, zipping up a development directory actually
is a valid way of making an sdist, and nonetheless projects actually
do go to elaborate lengths to trick distutils into including generated
.c files. So I don't think it's likely they'll stop because of some
PEP that neglected to explicitly point out that this was possible :-).
But if you think the wording could be improved I'm certainly open to
that.
I think that we currently have so much confusion over "what a sdist
is" that a new over-general definition isn't going to help. What we
need to do is to *pin down* the definition of a sdist, not allow the
term to continue to mean too much (and hence, ultimately, very
little).

Does my definition of a sdist above in terms of being
target-independent but containing all files that can be generated in a
target-independent way clarify what I'm intending? I'd be happy if
there was wording that left it as optional how much a project needed
to eliminate build dependencies by including the output of those
dependencies in the sdist, but I'd much prefer it if there was a
strong implication that if files could be generated without reference
to the target architecture, and doing so eliminated a build
dependency, then they should. (To give a specific example, I'd prefer
it if it was clear that sdists should always include C sources
generated by cython - even though that requirement isn't enforceable
in any practical sense).
...
(I guess I do have some generic preference that we not insist on PEPs
serving as end-user documentation -- the intended audience here is
experts, the definitions are written to mean exactly what they say,
etc., and there are real trade-offs between being precise and being
easily comprehensible by non-experts. But I also would like you to be
happy :-).)
Agreed we don't intend these things to be for end users. But I think
it's important that the experts have something detailed and precise,
as ultimately they'll have to implement code based on the PEP. And
worse still, anyone wanting to implement an alternative to pip has a
right to expect that everything they need is in a PEP, not in
"people's understanding".

I don't know if it's clear (I hope it is but it's hard to be sure :-))
but my comments are from the perspective of someone who knows the
internals of pip, but would like to be able to (re-) write it without
ever having to refer to pip's code in order to do so. I think that's a
reasonable goal to aim for, as not being able to do that is precisely
what got us into the mess where we daren't touch distutils because we
don't know what it's supposed to do other than "what it does"...

Thanks for considering my happiness :-) It's not too easy to make me
miserable, so don't worry - the big issue is that I enjoy long complex
detail-oriented debates, so you're better off not trying *too* hard to
increase my happiness in that direction!!! :-)

Paul