[Distutils] Provisionally accepting PEP 517's declarative build system interface

Donald Stufft donald at stufft.io
Fri Jun 2 23:38:34 EDT 2017

> On Jun 2, 2017, at 10:14 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, Jun 2, 2017 at 9:39 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>> Note that there's an implication here - if we define the build process
>> in terms of the effect of "going via a sdist", then we need to at
>> least have an intuitive understanding of what that means in practice.
>> I don't think it's a contentious point (even if the specific term
>> "sdist" is open to debate), as I think repeatable builds are a
>> well-understood idea. (It's at this point that the concerns of people
>> who want incremental builds come in - we should support incremental
>> builds in a way that preserves the "just like going via a sdist"
>> principle. But again, they need to raise their concerns if they think
>> we're missing something key to their use case).
> So far my belief is that packages with expensive build processes are
> going to ignore you and implement, ship, document, and recommend the
> direct source-tree->wheel path for developer builds. You can force the
> make-a-wheel-from-a-directory-without-copying-and-then-install-it
> command have a name that doesn't start with "pip", but it's still
> going to exist and be used. Why wouldn't it? It's trivial to implement
> and it works, and I haven't heard any alternative proposals that have
> either of those properties. [1]

If someone wants to implement a direct-to-wheel build tool and have it compete with ``pip install .`` they’re more than welcome to. Competition is healthy and at the very worst case it could validate either the idea that direct-to-wheel is important enough that people will gladly overcome the relatively small barrier of having to install another tool and then we have data to indicate maybe we need to rethink things or it could validate the idea that it’s not important enough and leave things as they are.

I went and looked through all 105 pages of pip’s issues (open and closed) and made several searches using any keyword I could think of looking for any issue where someone asked for this. The only times I can find anyone asking for this were you and Ralf Gommers as part of the extended discussion around this set of PEPs and I’ve not been able to find a single other person asking for it or complaining about it. 

However, what I was able to find was what appears to be the original reason pip started copying the directory to begin with, https://github.com/pypa/pip/issues/178 <https://github.com/pypa/pip/issues/178> which was caused by the build system reusing the build directory between two different virtual environments and causing an invalid installation to happen. The ticket is old enough that I can get at specifics it because it was migrated over from bitbucket. However the fact that we *used* to do exactly what you want and it caused exactly one of problem I was worried about seems to suggest to me that pip is absolutely correct in keeping this behavior.

> Relatedly, the idea of a copy_files hook doesn't make sense to me. The
> only reason pip wants to force builds through the sdist phase is
> because it doesn't trust the backend to make clean wheels, and it's
> willing to make its local directory builds much slower to get that
> guarantee. When you add copy_files, you lose that guarantee *and*
> you're still making local directory builds much slower, so what's the
> point? If the always-via-sdist plan doesn't work for either the
> simplest cases (flit) or the most complex (incremental builds), then
> is it really a good plan?

It’s not that I don’t trust the backend, it’s that I believe in putting in systems that make it harder to do the wrong thing than the right thing. As it is now building in place correctly requires the build backend to do extra work to ensure that some file that wouldn’t be included in the sdist doesn’t influence the build in some way. Given that I’m pretty sure literally every build tool in existence for Python currently fails this test, I think that is a pretty reasonable statement to say that it might continue to be a problem into the future.

Copying the files makes that harder to do (but still easier than always going through the sdist). If you want to argue that we should always go through the sdist and we shouldn’t have a copy_files hook, I’m ok with that. I’m only partially in favor of it as a performance trade off because I think it passes a high enough bar that it’s unlikely enough for mistakes to be made (and when they do, they’ll be more obvious).

> … <snip> ...
> - 'flit' adds code to make sdist-from-sdist work. (One way: when
> building an sdist from a VCS checkout, make a list of all the
> ancillary files and include it in the resulting sdist. Or possibly
> just a list of all files + hashes. When asked to make an sdist from an
> arbitrary directory, check for this file, and if present use it as the
> list of ancillary files to include, and possibly check if any hashes
> have changed, and if so change the version number of the resulting
> sdist by appending "+dirty" or something; otherwise, use the current
> VCS-based system.)

This seems reasonable to me.

> One thing that's not clear to me: a crucial use case for sdists is (1)
> download, (2) unpack, (3) patch the source, possibly adding new files,
> (4) build and install. (After all, the whole reason we insist on
> distributing sdists is that open source software should be modifiable
> by the recipient.) Does flit currently support this, given the
> reliance on VCS metadata?
> Other unresolved issues:
> - Donald had some concerns about get_wheel_metadata and they've led to
> several suggestions, none of which has made everyone go "oh yeah
> obviously that's the solution". To me this suggests we should go ahead
> and drop it from PEP 517 and add it back later if/when the need is
> more obvious. It's optional anyway, so adding it later doesn't hurt
> anything.

My main concern is the metadata diverging between the get_wheel_metadata and the building wheel phase. The current PEP solves that in a reasonable enough way (and in a way I can assert against). My other concerns are mostly just little API niggles to make it harder to mess up.

I think this one is important to support because we do not to be able to get at the dependencies, and invoking the entire build chain to do that seems like it will be extraordinarily slow.

> - It sounds like there's some real question about how exactly a build
> frontend should handle the output from build_wheel; in particular, the
> PEP should say what happens if there are multiple files deposited into
> the output dir. My original idea when writing the PEP was that the
> build frontend would know the name/version of the wheel it was looking
> for, and so it would ignore any other files found in the output dir,
> which would be forward compatible with a future PEP allowing
> build_wheel to drop multiple wheels into the output dir (i.e., old
> pip's would just ignore them). It's clear from the discussion that
> this isn't how others were imagining it. Which is fine, I don't think
> this is a huge problem, but we should nail it down so we're not
> surprised later.

How do you determine the name/version for ``pip install .`` except by running get_wheel_metadata or build_wheel or build_sdist?

> -n
> [1] Donald's suggestion of silently caching intermediate files in some
> global cache dir is unreasonably difficult to implement in a
> user-friendly way – cache management is Hard, and I frankly I still
> don't think users will accept individual package's build systems
> leaving hundreds of megabytes of random gunk inside hidden
> directories. We could debate the details here, but basically, if this
> were a great idea to do by default, then surely one of
> cmake/autoconf/... would already do it? Also, my understanding is the
> main reason pip wants to copy files in the first place is to avoid
> accidental pollution between different builds using the same local
> tree; but if a build system implements a global cache like this then
> surprise, now you can get pollution between arbitrary builds using
> different trees, or between builds that don't even use a local tree at
> all (e.g. running 'pip install numpy==1.12.0' can potentially cause a
> later run of 'pip install numpy==1.12.1' to be corrupted). And, it
> assumes that all build systems can easily support out-of-tree
> incremental builds, which is often true but not guaranteed when your
> wheel build has to wrap some random third party C library's build
> system.

Make it opt-in and build a hash of the directory into the cache key so different file contents mean different cache objects then. I’m not really sold on the idea that the fact some developers haven’t decided to do it then it is a bad idea. Perhaps those build systems are operating under different constraints than we are (I’m almost certainly sure this is the case).

Donald Stufft

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170602/c382ab35/attachment-0001.html>

More information about the Distutils-SIG mailing list