
On 9 November 2015 at 17:21, Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Nov 9, 2015 at 7:34 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 9 November 2015 at 05:20, Nathaniel Smith <njs@pobox.com> wrote:
A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``.
I still find these two definitions unhelpful, sorry.
We don't *need* an interface to install from a source tree. It's entirely feasible to have a standard interface to build a sdist from a source tree and go source tree -> sdist -> wheel -> install. That doesn't cater for editable installs, nor does it cater for reusing things like object files from previous builds, so there may be *benefits* to having a richer interface than this, but it's wrong to say it's needed.
I am confuse. All that sentence is saying is that (a) it is useful to have the phrase "source tree" as distinct from "sdist" so we can talk about them (which I assume you agree about because you use that phrase in your response :-)),
Agreed.
and (b) there must be *some* interface that allows people to type "pip install some-directory/" and have it work because that's a feature we have to support (which I assume you agree about because you immediately propose an interface for supporting that feature).
Are we talking at cross purposes here? The end user interface "pip install directory" is OK. What I think this PEP is saying is that we need a way for pip to *implement* that functionality in terms of primitive operations that the "source tree" must support. That, again, I'm fine with. But you're then saying (I think) that the primitive operation a source tree must provide is an "install" operation - and that's what I fundamentally disagree with. The source tree should provide a "build" primitive. If we agree on that (which I think we do, but I don't think the PEP says so), then there's still a further point, on which I think we do disagree, and that's over sdists. I think that there are *two* steps within the build process, and these need to be separated out: 1. Make a structured archive of the project's sources. This includes creation of all generated source files that can be created in a target-independent way. This would include (static) metadata, generated source files such as cython output, etc. The point about this archive is that it is fully target-independent, and does not require any tools to build it that are not fundamentally target-dependent. This is what I consider to be the "sdist". There should only ever need to be one sdist for a given name/version of a project, precisely because it's totally portable, by design. 2. Create target-dependent installable wheels. This is the "build" step, in the sense that it's when you run a compiler to create platform-specific binaries. With this model, the install process is specifically source tree ---> sdist ---> wheel ---> installed package It is possible that tools could merge some of these steps, but a generic tool like pip that manages the running of the steps in an appropriate order needs to work in terms of the fundamental building blocks. So I am strongly opposed to proposals that treat source tree ---> wheel as a primitive operation, because they hamper pip's ability to manage things at the level of the fundamental steps. One of the worst aspects of distutils, and one that pip is still far from free of, is the fact that distutils provides merged steps like source tree ---> installed package, and we (mistakenly, in hindsight) used them to "optimise" the way pip works. It did optimise things in some ways, I guess, but it makes it really hard to disentangle things when we want to modularise processing. The above is of course idealised. Editable installs are one example of something that simply doesn't follow this pattern, and as far as I can see they make no sense *except* as a source tree --> editable install one-step operation. Also, modularising the steps to this extent does have downsides - separating source tree --> sdist and sdist --> wheel makes it harder to do "in place rebuild" optimisations. We can agree or disagree on the trade-offs, or we can work on trying to get the best of both worlds, but I still think we should be starting (certainly when working at the spec/PEP level) from a clean conceptual model.
It sounds like we do disagree about the details of what this interface should look like and thus how "pip install some-directory/" should work internally, but that's not a problem with the definition (or indeed something that this PEP's text currently takes any stance on at all :-)).
As I say, I think we're talking at cross purposes. I read the PEP as trying to specify (the wrong) primitives for pip to use. I'm not sure what you intend the PEP to say - maybe that "pip install <directory>" is the canonical install command? I don't think that needs a PEP, it's just how pip works (and other tools may choose to expose things in a different manner).
I suspect you're reluctant to require a "source tree -> sdist" interface, because the author of flit isn't comfortable with having such a thing. That's OK - if you want to note that a benefit of going direct to install (or wheel) is that tools that don't allow you to create a sdist are supported, then let's make that explicit. Expect plenty of pushback on the idea of tools that don't supply sdists though...
I actually haven't talked to Thomas about this particular point at all, and actually part of what started all this was my looking at flit and going "this is cool, but c'mon, you can't just throw away sdists" :-).
The reason I'm reluctant to require a "source tree -> sdist" interface is described here: https://mail.python.org/pipermail/distutils-sig/2015-November/027636.html
and also at the very top of this long email (which for some reason I can't seem to find in the mail.python.org archives?): https://www.mail-archive.com/distutils-sig@python.org/msg23144.html
The TL;DR is: obviously we need source tree -> sdist operations somewhere, and obviously we need mechanisms to increase the reliability of builds -- we all agree that there's some irreducible complexity there, those issues need to be addressed, the question is just where to put that complexity. I think putting it into the PEP for the build frontend <-> build backend interface is the wrong place, because it increases spec complexity (the worst kind of complexity) and it rules out the useful feature of incremental rebuilds. (And by "useful feature" there I mean "if we regress from distutils by failing to support this, then there's a good chance downstream devs will simply refuse to use our new design".)
But here I think we have a new term that's adding confusion. Pip isn't a "build frontend". In 99% of cases pip does no building at all. Basically, pip is a manager of build and install steps, and to manage those steps successfully, it needs clear definitions of the steps involved. In the extreme case, if there's a step "take a source tree and install it" you've left nothing for pip to manage, and you may as well go back to setup.py install. I think that extracting and formalising the fundamental ("atomic" if you like) steps that constitute going from a source tree to an installed package, is precisely the sort of simplification a spec/PEP *must* do. In doing so, there are engineering trade-offs such as how we reintroduce incremental rebuilds without compromising the model. Such trade-offs may imply a need to add complexity to the spec (maybe in terms of optional "combined" steps such as source tree --> wheel), but it should be clear that these are (a) optional (as in, the process works fine with just the atomic steps) and (b) optimisations (as in, they can't alter the ultimate behaviour as defined in terms of atomic steps).
Certainly your definition of a sdist is general enough that it doesn't preclude such things. But on the other hand, it doesn't offer any suggestion that this is an important feature of a sdist (and it is - I say that as someone who has needed to build wheels from a sdist and doesn't have Cython installed). From your definition, people will infer that zipping up a development directory makes a sdist, and so that's what they'll do. Because after all, making Cython a build requirement and generating the C at build time is *also* an option, it's just not as friendly to the average user.
Hmm, I certainly agree that it doesn't preclude such things, because I am very aware of this use case (I maintain projects that handle Cython in exactly the way you describe), and it never occurred to me that this could *not* be supported :-). I'm not sure what you're worried about exactly? Right now, zipping up a development directory actually is a valid way of making an sdist, and nonetheless projects actually do go to elaborate lengths to trick distutils into including generated .c files. So I don't think it's likely they'll stop because of some PEP that neglected to explicitly point out that this was possible :-). But if you think the wording could be improved I'm certainly open to that.
I think that we currently have so much confusion over "what a sdist is" that a new over-general definition isn't going to help. What we need to do is to *pin down* the definition of a sdist, not allow the term to continue to mean too much (and hence, ultimately, very little). Does my definition of a sdist above in terms of being target-independent but containing all files that can be generated in a target-independent way clarify what I'm intending? I'd be happy if there was wording that left it as optional how much a project needed to eliminate build dependencies by including the output of those dependencies in the sdist, but I'd much prefer it if there was a strong implication that if files could be generated without reference to the target architecture, and doing so eliminated a build dependency, then they should. (To give a specific example, I'd prefer it if it was clear that sdists should always include C sources generated by cython - even though that requirement isn't enforceable in any practical sense).
(I guess I do have some generic preference that we not insist on PEPs serving as end-user documentation -- the intended audience here is experts, the definitions are written to mean exactly what they say, etc., and there are real trade-offs between being precise and being easily comprehensible by non-experts. But I also would like you to be happy :-).)
Agreed we don't intend these things to be for end users. But I think it's important that the experts have something detailed and precise, as ultimately they'll have to implement code based on the PEP. And worse still, anyone wanting to implement an alternative to pip has a right to expect that everything they need is in a PEP, not in "people's understanding". I don't know if it's clear (I hope it is but it's hard to be sure :-)) but my comments are from the perspective of someone who knows the internals of pip, but would like to be able to (re-) write it without ever having to refer to pip's code in order to do so. I think that's a reasonable goal to aim for, as not being able to do that is precisely what got us into the mess where we daren't touch distutils because we don't know what it's supposed to do other than "what it does"... Thanks for considering my happiness :-) It's not too easy to make me miserable, so don't worry - the big issue is that I enjoy long complex detail-oriented debates, so you're better off not trying *too* hard to increase my happiness in that direction!!! :-) Paul