On Sat, Jul 15, 2017 at 3:54 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 15 July 2017 at 10:42, Nathaniel Smith <njs@pobox.com> wrote:
Hi Paul,
We seem to have some really fundamental miscommunication here; probably we should figure out what that is instead of continuing to talk past each other.
Agreed. Thanks for summarising your understanding. Let's see if I can clarify what I'm saying.
Thanks for your patience!
As a wild guess... can you define what an "out-of-place build" means to you?
I'm going to do this without reference to your explanation, as trying to put things in the context of what you say is where I'm getting messed up. I'll comment on your explanations below.
Again, I'll start with some background. My concern is where we're trying to deal with a user doing "pip install ." on their development directory. This is not a core use case for pip, and honestly I don't feel it's the right workflow (people wanting to work on code and install it as they go along should be using editable installs, IMO), but it is something we see people doing, and I don't want to break that workflow arbitrarily. Given that this is the case we're talking about, my experience is that working directories contain all sorts of clutter - small test files I knocked up, experimental changes I discarded, etc. That may simply reflect the way I work, but comments I've seen indicate that I'm not *completely* alone. So for me, the point here is about making sure that "pip install ." in a cluttered working directory results in "what the developer wants".
Oh yeah, I end up with all kinds of junk in my working directories too.
For me the key property I'm looking for is that the developer gets consistent results for the build commands (i.e., build_wheel and build_sdist->build a wheel give the same wheel). This is important for a number of reasons - to avoid publishing errors where the developer builds a wheel and a sdist and deploys them, to ensure that tox (which uses sdists) gives the same results as manually running the tests, etc. In one of your posts you characterised the sorts of discrepancies I'm trying to avoid as "weird errors" and that's precisely the point - we get confused users raising issues and I want to avoid that happening.
Right.
So, with that in mind, the distinction between an "in place" and an "out of place" build is that an in-place build simply trusts that the developer's directory will deliver consistent results, whereas an out-of-place build does the build in a separate location that we've asked the backend to ensure doesn't contain unexpected files. It has nothing to do with repeated builds (but see below).
...so this is where we diverge. As far as I understand it -- and I'm pretty sure this matches all the major build systems like automake, cmake, etc. -- the *only* difference between an in-place and out-of-place build is where they cache the intermediate artifacts. So if a build system is, say, scanning the source tree looking for stuff to build... well, the source tree is the same either way, so if they're going to pick up random junk in one case, they'll do it in the other case as well. In fact I think they'd consider it a bug if they didn't. Or contrariwise, if a build system is smart enough to recognize that some files are junk and some are not, then it doesn't matter where it's putting the intermediate files, it'll generate good results either way. Or for a more specific example: setuptools has the unfortunate distinction between sdist mode that uses MANIFEST.in, and bdist mode that uses setup.py, and skew between these is a frequent source of problems. But IIUC the only way to exercise the MANIFEST.in path is by actually generating an sdist. You can do in-place builds (the default), or out-of-place builds ('build -b some/dir'), but if one is broken then the other is broken as well.
For me, the distinction between an in-place and out-of-place build is, ... well, first some background to make sure my terminology is clear: build systems typically work by taking a source tree as input and executing a series of rules to generate intermediate artifacts and eventually the final artifacts. Commonly, as an optimization, they have some system for caching these intermediate artifacts, so that future builds can go faster (called "incremental builds"). However, this optimization is often heuristic-based and therefore introduces a risk: if the system re-uses a cached artifact that it should have rebuilt, then this can generate a broken build.
There are two popular strategies for storing this cache, and this is what "in-place" versus "out-of-place" refers to.
"In-place builds" have a single cache that's stored inside the source tree -- often, but not always, intermingled with the source files. So a classic 'make'-based build where you end up with .o files next to all your .c files is an in-place build, and so is 'python setup.py build' putting a bunch of .o files inside the build/ directory.
"Out-of-place builds" instead place the cached artifacts into a designated separate directory. The advantage of this is that you can potentially work around limitations of the caching strategy by having multiple caches and switching between them.
[In traditional build systems the build tree concept is also often intermingled with the idea of a "build configuration", like debug versus optimized builds and this changes the workflow in various -- but we don't have those and it's a whole extra set of complexity so let's ignore that.]
For me, all of the above comes under the heading of "incremental builds", and I'm considering that out of scope. Specifically, pip's current behaviour offers no (documented) means of choosing between incremental or clean builds, and users who want that level of control should be building with the backend tools (setuptools) directly, and only using pip for the install step once a wheel has been built.
If and when we discuss a UI in pip for requesting incremental or clean builds, we'd look at the implications on the backend hooks at that point - but I'm not sure that'll ever be something we want to do, as it seems like that should probably always be a use case that we'd want users to be working directly with the backend for (but that's just my opinion).
Well, so this is part of what's been making me so confused :-). AFAICT the only reason to care about in-place versus out-of-place builds in PEP 517 is if we want to provide explicit control over incremental builds, which seems a weird place to be spending our complexity budget to me too.
Corollaries:
- if you're starting with a pristine source tree, then "in-place" and "out-of-place" builds will produce exactly the same results, because they're running exactly the same rules. (This is why I'm confused about why you seem to be claiming that out-of-place builds will help developers avoid bugs that happen with in-place builds... they're exactly the same thing!)
Agreed, but I'm concerned about build trees that *aren't* "pristine", insofar as they are working directories for development. All of your corollaries depend on the idea that you have a "pristine" build tree, and that's where our confusion lies, I suspect.
Ah, right -- when I said "pristine" here I was implicitly thinking "pristine with respect to intermediate build artifacts", since as far as I know the other kind of pristine is orthogonal to this whole discussion.
- if you've done an out-of-place build in a given tree, you can return to a pristine source tree by deleting the out-of-place directory and making a new one, without having to deal with the build backend. if you've done an in-place build in a given tree, then you need something like a "make clean" rule. But if you have that, then these are identical, which is why I said that it sounded like pip would be just as happy with a way to do a clean build. (I'm not saying that the spec necessarily needs a way to request a clean build -- this is just trying to understand what the space of options actually is.)
Again, agreed but irrelevant, as "pristine" is not the case that concerns me.
- if you're starting with a pristine source tree, and your goal is to end up with a wheel *while keeping the original tree pristine*, then some options include: (a) doing a copytree + in-place build on the copy, like pip does now, (b) making an sdist and then doing an in-place build
Same again
- if you're not starting with a pristine source tree -- like say the user went and did an in-place build here before invoking pip -- then you have very few good options. Copytree + in-place build will hopefully work, but there's a chance you'll pick up detritus from the previous build. Out-of-tree-builds might or might not work -- I've given several examples of extant build systems that explicitly disclaim any reliability in this case. Sdist + build the sdist is probably the most reliable option here, honestly, when it's an option.
Your idea of a "not pristine" tree differs from mine - having done an in-place build is the most innocuous example of a non-pristine tree as far as I'm concerned, and the easiest to deal with (make clean).
* Copytree is certain *not* to work, because it copies all the things that make the tree not pristine. * Build sdist and unpack is pip's current planned approach, but Thomas had issues with guaranteeing that building a sdist was guaranteed possible. We do *not* want to have cases where pip can't build a wheel even though build_wheel would have worked, which means build sdist and unpack is a problem.
Right -- the idea in my simplified proposal is to expose just enough to give frontends the flexibility to support both of these, plus fallback from the latter to the former when necessary, so it's at least no worse than what we have now.
* Ask the backend to make a "clean" directory would work (the backend should know what it needs) - that was the prepare_directory hooks. But that got too complex. * Tell the backend we want a build that's isolated from the source directory and trust it to do the right thing is where we've currently ended up.
Based on the current discussion, however, I now have concerns that either
a) Backend developers might not understand what build_directory is requesting, or b) The PEP doesn't define the semantics of build_directory in a way that delivers the results I'm suggesting here
Having had this discussion, and re-read the current draft of the PEP, I do in fact think that (b) is the case. That worries me, because I don't think it's just me that had made that mistake. Nick has just posted a message saying
Requesting an out-of-tree wheel build is then just a way for a frontend to say to the backend "Hey, please build the wheel *as if* you'd exported an sdist and then built that, even if you can't actually export an sdist right now".
So my understanding is that that's what the build_wheel operation is -- like, backends should *always* be generating the same wheel as they would if they'd built an sdist, and if they don't, it's a bug. Of course bugs do happen, and distutils's fundamental architecture has some mistakes in it, and it makes sense that pip wants to try and be robust against them. But that requires some specific strategy for avoiding specific bugs, and I'm not sure what you have in mind here. To me having two different ways to run the build just seems like it gives twice as many chances for bugs (or maybe more once you take interactions into account). Similarly we could have a literal boolean flag that is documented to mean "please build the wheel *as if* you'd exported an sdist and then built that", but what specifically would you expect backends to do differently if this was set? Are there any circumstances where you wouldn't want this to be set? -n -- Nathaniel J. Smith -- https://vorpus.org