Update to my skeletal PEP for a new build system interface
Hi all,
Here's a quick update to my draft PEP for a new build system
interface, last seen here:
https://mail.python.org/pipermail/distutils-sig/2015-October/027360.html
There isn't terribly much here, and Robert and I should really figure
out how to reconcile what we have, but since I was rearranging some
stuff anyway and prepping possible new sections, I figured I'd at
least post this. I addressed all the previous comments so hopefully it
is boring and non-controversial :-).
Changes:
- It wasn't clear that the sdist metadata stuff was really helping, so
I took it out for now. So it doesn't get lost, I split it out as a
standalone deferred-status PEP to the pypa repository:
https://github.com/pypa/interoperability-peps/pull/57
- Rewrote stuff to deal with Paul's comments
- Added new terminology: "build frontend" for something like pip, and
"build backend" for the project specific hooks that pip calls. Seems
helpful.
-n
----
PEP: ??
Title: A build-system independent format for source trees
Version: $Revision$
Last-Modified: $Date$
Author: Nathaniel J. Smith
So I think the big thing to reconcile is the thing where I say out of scope - using static metadata in distributions. The dynamic/non-dynamic split may or may not be needed; but examining source trees and sdists as different only makes sense to me if you're gaining something - and pregenerated declarative dependencies might be one of those things. OTOH since we haven't solved the numpy ABI problem yet... perhaps thats premature. btw - note that this: ---- 2) There currently exist packages which do not declare consistent metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different ``install_requires=``). When upgrading to the new system, projects will have to evaluate whether this applies to them, and if so they will need to stop doing that. ---- is actually a feature today. egg_info gets you the follow-these-rules-locally with older pip versions like 1.5.4, and bdist_wheel gets you a marker-ready set of dependencies. -Rob
On 9 November 2015 at 05:20, Nathaniel Smith
A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``.
I still find these two definitions unhelpful, sorry. We don't *need* an interface to install from a source tree. It's entirely feasible to have a standard interface to build a sdist from a source tree and go source tree -> sdist -> wheel -> install. That doesn't cater for editable installs, nor does it cater for reusing things like object files from previous builds, so there may be *benefits* to having a richer interface than this, but it's wrong to say it's needed. I suspect you're reluctant to require a "source tree -> sdist" interface, because the author of flit isn't comfortable with having such a thing. That's OK - if you want to note that a benefit of going direct to install (or wheel) is that tools that don't allow you to create a sdist are supported, then let's make that explicit. Expect plenty of pushback on the idea of tools that don't supply sdists though...
A *source distribution* is a static snapshot representing a particular release of some source code, like ``lxml-3.4.4.zip``. Source distributions serve many purposes: they form an archival record of releases, they provide a stupid-simple de facto standard for tools that want to ingest and process large corpora of code, possibly written in many languages (e.g. code search), they act as the input to downstream packaging systems like Debian/Fedora/Conda/..., and so forth. In the Python ecosystem they additionally have a particularly important role to play, because packaging tools like ``pip`` are able to use source distributions to fulfill binary dependencies, e.g. if there is a distribution ``foo.whl`` which declares a dependency on ``bar``, then we need to support the case where ``pip install bar`` or ``pip install foo`` automatically locates the sdist for ``bar``, downloads it, builds it, and installs the resulting package.
Source distributions are also known as *sdists* for short.
One key feature of the current sdists that you are either overlooking or ignoring is that they can, and do, contain *built* files. The best example is projects using Cython. The sdist contains generated C files, so that users building wheels from the sdist don't need cython installed. Certainly your definition of a sdist is general enough that it doesn't preclude such things. But on the other hand, it doesn't offer any suggestion that this is an important feature of a sdist (and it is - I say that as someone who has needed to build wheels from a sdist and doesn't have Cython installed). From your definition, people will infer that zipping up a development directory makes a sdist, and so that's what they'll do. Because after all, making Cython a build requirement and generating the C at build time is *also* an option, it's just not as friendly to the average user. Paul
On November 9, 2015 at 10:35:24 AM, Paul Moore (p.f.moore@gmail.com) wrote:
On 9 November 2015 at 05:20, Nathaniel Smith wrote:
A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``.
I still find these two definitions unhelpful, sorry.
We don't *need* an interface to install from a source tree. It's entirely feasible to have a standard interface to build a sdist from a source tree and go source tree -> sdist -> wheel -> install. That doesn't cater for editable installs, nor does it cater for reusing things like object files from previous builds, so there may be *benefits* to having a richer interface than this, but it's wrong to say it's needed.
I suspect you're reluctant to require a "source tree -> sdist" interface, because the author of flit isn't comfortable with having such a thing. That's OK - if you want to note that a benefit of going direct to install (or wheel) is that tools that don't allow you to create a sdist are supported, then let's make that explicit. Expect plenty of pushback on the idea of tools that don't supply sdists though…
Regardless of whether we end up mandating a source tree -> sdist -> wheel -> install path or if we support two paths, source tree -> sdist -> wheel -> install and source tree -> wheel -> install, I don’t think it’s likely we’re going to ever get to a place that sdists are an optional or non-standard artifact or interface. I think it is mandatory that we treat (and recognize) that a sdist is different than an arbitrary directory and that they may (or may not) have a structure that matches what it looks like on disk. It is entirely possible (and likely) that at some point in the future we will have a new sdist format that looks less like someone just zipped up their VCS checkout and more like a structured format. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Mon, Nov 9, 2015 at 7:34 AM, Paul Moore
On 9 November 2015 at 05:20, Nathaniel Smith
wrote: A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``.
I still find these two definitions unhelpful, sorry.
We don't *need* an interface to install from a source tree. It's entirely feasible to have a standard interface to build a sdist from a source tree and go source tree -> sdist -> wheel -> install. That doesn't cater for editable installs, nor does it cater for reusing things like object files from previous builds, so there may be *benefits* to having a richer interface than this, but it's wrong to say it's needed.
I am confuse. All that sentence is saying is that (a) it is useful to have the phrase "source tree" as distinct from "sdist" so we can talk about them (which I assume you agree about because you use that phrase in your response :-)), and (b) there must be *some* interface that allows people to type "pip install some-directory/" and have it work because that's a feature we have to support (which I assume you agree about because you immediately propose an interface for supporting that feature). It sounds like we do disagree about the details of what this interface should look like and thus how "pip install some-directory/" should work internally, but that's not a problem with the definition (or indeed something that this PEP's text currently takes any stance on at all :-)).
I suspect you're reluctant to require a "source tree -> sdist" interface, because the author of flit isn't comfortable with having such a thing. That's OK - if you want to note that a benefit of going direct to install (or wheel) is that tools that don't allow you to create a sdist are supported, then let's make that explicit. Expect plenty of pushback on the idea of tools that don't supply sdists though...
I actually haven't talked to Thomas about this particular point at all, and actually part of what started all this was my looking at flit and going "this is cool, but c'mon, you can't just throw away sdists" :-). The reason I'm reluctant to require a "source tree -> sdist" interface is described here: https://mail.python.org/pipermail/distutils-sig/2015-November/027636.html and also at the very top of this long email (which for some reason I can't seem to find in the mail.python.org archives?): https://www.mail-archive.com/distutils-sig@python.org/msg23144.html The TL;DR is: obviously we need source tree -> sdist operations somewhere, and obviously we need mechanisms to increase the reliability of builds -- we all agree that there's some irreducible complexity there, those issues need to be addressed, the question is just where to put that complexity. I think putting it into the PEP for the build frontend <-> build backend interface is the wrong place, because it increases spec complexity (the worst kind of complexity) and it rules out the useful feature of incremental rebuilds. (And by "useful feature" there I mean "if we regress from distutils by failing to support this, then there's a good chance downstream devs will simply refuse to use our new design".)
A *source distribution* is a static snapshot representing a particular release of some source code, like ``lxml-3.4.4.zip``. Source distributions serve many purposes: they form an archival record of releases, they provide a stupid-simple de facto standard for tools that want to ingest and process large corpora of code, possibly written in many languages (e.g. code search), they act as the input to downstream packaging systems like Debian/Fedora/Conda/..., and so forth. In the Python ecosystem they additionally have a particularly important role to play, because packaging tools like ``pip`` are able to use source distributions to fulfill binary dependencies, e.g. if there is a distribution ``foo.whl`` which declares a dependency on ``bar``, then we need to support the case where ``pip install bar`` or ``pip install foo`` automatically locates the sdist for ``bar``, downloads it, builds it, and installs the resulting package.
Source distributions are also known as *sdists* for short.
One key feature of the current sdists that you are either overlooking or ignoring is that they can, and do, contain *built* files. The best example is projects using Cython. The sdist contains generated C files, so that users building wheels from the sdist don't need cython installed.
Certainly your definition of a sdist is general enough that it doesn't preclude such things. But on the other hand, it doesn't offer any suggestion that this is an important feature of a sdist (and it is - I say that as someone who has needed to build wheels from a sdist and doesn't have Cython installed). From your definition, people will infer that zipping up a development directory makes a sdist, and so that's what they'll do. Because after all, making Cython a build requirement and generating the C at build time is *also* an option, it's just not as friendly to the average user.
Hmm, I certainly agree that it doesn't preclude such things, because I am very aware of this use case (I maintain projects that handle Cython in exactly the way you describe), and it never occurred to me that this could *not* be supported :-). I'm not sure what you're worried about exactly? Right now, zipping up a development directory actually is a valid way of making an sdist, and nonetheless projects actually do go to elaborate lengths to trick distutils into including generated .c files. So I don't think it's likely they'll stop because of some PEP that neglected to explicitly point out that this was possible :-). But if you think the wording could be improved I'm certainly open to that. (I guess I do have some generic preference that we not insist on PEPs serving as end-user documentation -- the intended audience here is experts, the definitions are written to mean exactly what they say, etc., and there are real trade-offs between being precise and being easily comprehensible by non-experts. But I also would like you to be happy :-).) -n -- Nathaniel J. Smith -- http://vorpus.org
On 9 November 2015 at 17:21, Nathaniel Smith
On Mon, Nov 9, 2015 at 7:34 AM, Paul Moore
wrote: On 9 November 2015 at 05:20, Nathaniel Smith
wrote: A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``.
I still find these two definitions unhelpful, sorry.
We don't *need* an interface to install from a source tree. It's entirely feasible to have a standard interface to build a sdist from a source tree and go source tree -> sdist -> wheel -> install. That doesn't cater for editable installs, nor does it cater for reusing things like object files from previous builds, so there may be *benefits* to having a richer interface than this, but it's wrong to say it's needed.
I am confuse. All that sentence is saying is that (a) it is useful to have the phrase "source tree" as distinct from "sdist" so we can talk about them (which I assume you agree about because you use that phrase in your response :-)),
Agreed.
and (b) there must be *some* interface that allows people to type "pip install some-directory/" and have it work because that's a feature we have to support (which I assume you agree about because you immediately propose an interface for supporting that feature).
Are we talking at cross purposes here? The end user interface "pip install directory" is OK. What I think this PEP is saying is that we need a way for pip to *implement* that functionality in terms of primitive operations that the "source tree" must support. That, again, I'm fine with. But you're then saying (I think) that the primitive operation a source tree must provide is an "install" operation - and that's what I fundamentally disagree with. The source tree should provide a "build" primitive. If we agree on that (which I think we do, but I don't think the PEP says so), then there's still a further point, on which I think we do disagree, and that's over sdists. I think that there are *two* steps within the build process, and these need to be separated out: 1. Make a structured archive of the project's sources. This includes creation of all generated source files that can be created in a target-independent way. This would include (static) metadata, generated source files such as cython output, etc. The point about this archive is that it is fully target-independent, and does not require any tools to build it that are not fundamentally target-dependent. This is what I consider to be the "sdist". There should only ever need to be one sdist for a given name/version of a project, precisely because it's totally portable, by design. 2. Create target-dependent installable wheels. This is the "build" step, in the sense that it's when you run a compiler to create platform-specific binaries. With this model, the install process is specifically source tree ---> sdist ---> wheel ---> installed package It is possible that tools could merge some of these steps, but a generic tool like pip that manages the running of the steps in an appropriate order needs to work in terms of the fundamental building blocks. So I am strongly opposed to proposals that treat source tree ---> wheel as a primitive operation, because they hamper pip's ability to manage things at the level of the fundamental steps. One of the worst aspects of distutils, and one that pip is still far from free of, is the fact that distutils provides merged steps like source tree ---> installed package, and we (mistakenly, in hindsight) used them to "optimise" the way pip works. It did optimise things in some ways, I guess, but it makes it really hard to disentangle things when we want to modularise processing. The above is of course idealised. Editable installs are one example of something that simply doesn't follow this pattern, and as far as I can see they make no sense *except* as a source tree --> editable install one-step operation. Also, modularising the steps to this extent does have downsides - separating source tree --> sdist and sdist --> wheel makes it harder to do "in place rebuild" optimisations. We can agree or disagree on the trade-offs, or we can work on trying to get the best of both worlds, but I still think we should be starting (certainly when working at the spec/PEP level) from a clean conceptual model.
It sounds like we do disagree about the details of what this interface should look like and thus how "pip install some-directory/" should work internally, but that's not a problem with the definition (or indeed something that this PEP's text currently takes any stance on at all :-)).
As I say, I think we're talking at cross purposes. I read the PEP as trying to specify (the wrong) primitives for pip to use. I'm not sure what you intend the PEP to say - maybe that "pip install <directory>" is the canonical install command? I don't think that needs a PEP, it's just how pip works (and other tools may choose to expose things in a different manner).
I suspect you're reluctant to require a "source tree -> sdist" interface, because the author of flit isn't comfortable with having such a thing. That's OK - if you want to note that a benefit of going direct to install (or wheel) is that tools that don't allow you to create a sdist are supported, then let's make that explicit. Expect plenty of pushback on the idea of tools that don't supply sdists though...
I actually haven't talked to Thomas about this particular point at all, and actually part of what started all this was my looking at flit and going "this is cool, but c'mon, you can't just throw away sdists" :-).
The reason I'm reluctant to require a "source tree -> sdist" interface is described here: https://mail.python.org/pipermail/distutils-sig/2015-November/027636.html
and also at the very top of this long email (which for some reason I can't seem to find in the mail.python.org archives?): https://www.mail-archive.com/distutils-sig@python.org/msg23144.html
The TL;DR is: obviously we need source tree -> sdist operations somewhere, and obviously we need mechanisms to increase the reliability of builds -- we all agree that there's some irreducible complexity there, those issues need to be addressed, the question is just where to put that complexity. I think putting it into the PEP for the build frontend <-> build backend interface is the wrong place, because it increases spec complexity (the worst kind of complexity) and it rules out the useful feature of incremental rebuilds. (And by "useful feature" there I mean "if we regress from distutils by failing to support this, then there's a good chance downstream devs will simply refuse to use our new design".)
But here I think we have a new term that's adding confusion. Pip isn't a "build frontend". In 99% of cases pip does no building at all. Basically, pip is a manager of build and install steps, and to manage those steps successfully, it needs clear definitions of the steps involved. In the extreme case, if there's a step "take a source tree and install it" you've left nothing for pip to manage, and you may as well go back to setup.py install. I think that extracting and formalising the fundamental ("atomic" if you like) steps that constitute going from a source tree to an installed package, is precisely the sort of simplification a spec/PEP *must* do. In doing so, there are engineering trade-offs such as how we reintroduce incremental rebuilds without compromising the model. Such trade-offs may imply a need to add complexity to the spec (maybe in terms of optional "combined" steps such as source tree --> wheel), but it should be clear that these are (a) optional (as in, the process works fine with just the atomic steps) and (b) optimisations (as in, they can't alter the ultimate behaviour as defined in terms of atomic steps).
Certainly your definition of a sdist is general enough that it doesn't preclude such things. But on the other hand, it doesn't offer any suggestion that this is an important feature of a sdist (and it is - I say that as someone who has needed to build wheels from a sdist and doesn't have Cython installed). From your definition, people will infer that zipping up a development directory makes a sdist, and so that's what they'll do. Because after all, making Cython a build requirement and generating the C at build time is *also* an option, it's just not as friendly to the average user.
Hmm, I certainly agree that it doesn't preclude such things, because I am very aware of this use case (I maintain projects that handle Cython in exactly the way you describe), and it never occurred to me that this could *not* be supported :-). I'm not sure what you're worried about exactly? Right now, zipping up a development directory actually is a valid way of making an sdist, and nonetheless projects actually do go to elaborate lengths to trick distutils into including generated .c files. So I don't think it's likely they'll stop because of some PEP that neglected to explicitly point out that this was possible :-). But if you think the wording could be improved I'm certainly open to that.
I think that we currently have so much confusion over "what a sdist is" that a new over-general definition isn't going to help. What we need to do is to *pin down* the definition of a sdist, not allow the term to continue to mean too much (and hence, ultimately, very little). Does my definition of a sdist above in terms of being target-independent but containing all files that can be generated in a target-independent way clarify what I'm intending? I'd be happy if there was wording that left it as optional how much a project needed to eliminate build dependencies by including the output of those dependencies in the sdist, but I'd much prefer it if there was a strong implication that if files could be generated without reference to the target architecture, and doing so eliminated a build dependency, then they should. (To give a specific example, I'd prefer it if it was clear that sdists should always include C sources generated by cython - even though that requirement isn't enforceable in any practical sense).
(I guess I do have some generic preference that we not insist on PEPs serving as end-user documentation -- the intended audience here is experts, the definitions are written to mean exactly what they say, etc., and there are real trade-offs between being precise and being easily comprehensible by non-experts. But I also would like you to be happy :-).)
Agreed we don't intend these things to be for end users. But I think it's important that the experts have something detailed and precise, as ultimately they'll have to implement code based on the PEP. And worse still, anyone wanting to implement an alternative to pip has a right to expect that everything they need is in a PEP, not in "people's understanding". I don't know if it's clear (I hope it is but it's hard to be sure :-)) but my comments are from the perspective of someone who knows the internals of pip, but would like to be able to (re-) write it without ever having to refer to pip's code in order to do so. I think that's a reasonable goal to aim for, as not being able to do that is precisely what got us into the mess where we daren't touch distutils because we don't know what it's supposed to do other than "what it does"... Thanks for considering my happiness :-) It's not too easy to make me miserable, so don't worry - the big issue is that I enjoy long complex detail-oriented debates, so you're better off not trying *too* hard to increase my happiness in that direction!!! :-) Paul
participants (4)
-
Donald Stufft
-
Nathaniel Smith
-
Paul Moore
-
Robert Collins