On Jun 1, 2017, at 6:28 PM, Paul Moore email@example.com wrote:
On 1 June 2017 at 23:14, Thomas Kluyver firstname.lastname@example.org wrote:
On Thu, Jun 1, 2017, at 10:49 PM, Paul Moore wrote:
pip also needs a way to deal with "pip install <local directory>. In this case, pip (under its current model) copies that directory to a working area. In that area, it runs the build command to create a wheel, and proceeds from there. In principle, there's little change in a PEP 517 world. But again, see below.
I still question whether the copying step is necessary for the frontend. Pip does it for setup.py builds (AIUI) because they might modify or create files in the working directory, and it wants to keep the source directory clean of that. Flit can create a wheel without modifying/creating any files in the working directory.
That's a very fair comment, and I honestly don't know how critical the copy step is - in the sense that I know we do it to prevent certain classes of issue, but I don't know what they are, or how serious they are. Perhaps Donald does?
It's certainly true that setup.py based builds are particularly unpleasant for the obvious "running arbitrary code" reasons. But I'm not sure how happy I am simply saying "backends must ..." what? How would we word this precisely? It's not just about keeping the sources clean, it's also about not being affected by unexpected files in the source directory. Consider that a build using a compiler will have object files somewhere. Should a backend use existing object files in preference to sources? What about a backend based on a tool designed to do precisely that, like waf or make? What if the files came from a build with different compiler flags? Sure, it's user error or a backend bug, but it'll be reported to pip as "I tried to install foo and my program failed when I imported it". We get that sort of bug report routinely (users reporting bugs in build scripts as pip problems) and we'll never have a technical solution to all the ways they can occur, but preventative code like copying the build files to a clean location can minimise them. (As I say, I'm speculating about whether that's actually why we build in a temp location, but it's certainly the sort of thinking that goes into our design).
I suspect the original reasoning behind copying to a temporary location has been lost to the sands of time. We’ve been doing that in pip for as long as I’ve worked on pip, maybe Jannis or someone remembers why I dunno!
From my end, copying the entire directory alleviates a few problems:
In the current environment, it prevents random debris from cluttering up and being written to the current directory, including build files.
It reduces errors caused by people/tooling editing files while a build is being processed. This can’t ever be fully removed, but by copying to a temporary location we narrow the window down considerably where someone can inadvertently muck up their build mid progress.
It prevents some issues with two builds running at the same time.
Narrowing that down to producing a sdist (or some other mechanism for doing a “copy what you would need” hook) in addition prevents:
Unexpected files changing the behavior of the build.
Misconfigured build tools appearing to “work” in development but failing when the sdist is released to PyPI or having the sdist and wheels be different because the wheel was produced from a VCS checkout but a build from a sdist wasn’t.
Ultimately you’re right, we could just encode this into PEP 517 and say that projects need to either give us a way to copy the files they need OR they need hygienic builds that do not modify the current directory at all. I greatly prefer not to do that though, because everyone is only human, and there is likely to be build backends that don’t do that— either purposely or accidentally— and it’ll likely be pip that fields those support issues (because they’ll see it as they invoked pip, so it must be pip’s fault).
In my mind the cost of requiring some mechanism of doing this is pretty low, the project obviously needs to know what files are important to it or else how is it going to know what it’s going to build in the first place. For most projects the amount of data that needs copied (versus is just stuff that is sitting there taking up space) is pretty small, so even on a really slow HDD the copy operating should not be a significant amount of time. It’s also not a particularly hard thing to implement I think— certainly it’s much easier than actually building a project in the first place.
There’s a principle here at Amazon that goes, “Good intentions don’t matter”. Which essentially means that simply saying you’re going to do something good doesn’t count because you’re inevitably going to forget or mess up and that instead of just having the intention to do something, you should have a process in place that ensures it is going to happen. Saying that we’re going to make the copying optional and hope that the build tools correctly build in place without an issue feels like a “good intention” to me, whereas adding the API and step that mandates (through technical means) they do it correctly is putting a process in place that ensures it is going to happen.
— Donald Stufft