On Fri, Jun 2, 2017 at 8:38 PM, Donald Stufft email@example.com wrote: >
On Jun 2, 2017, at 10:14 PM, Nathaniel Smith firstname.lastname@example.org wrote:
So far my belief is that packages with expensive build processes are going to ignore you and implement, ship, document, and recommend the direct source-tree->wheel path for developer builds. You can force the make-a-wheel-from-a-directory-without-copying-and-then-install-it command have a name that doesn't start with "pip", but it's still going to exist and be used. Why wouldn't it? It's trivial to implement and it works, and I haven't heard any alternative proposals that have either of those properties. 
If someone wants to implement a direct-to-wheel build tool and have it
pip install . they’re more than welcome to. Competition is
healthy and at the very worst case it could validate either the idea that
direct-to-wheel is important enough that people will gladly overcome the
relatively small barrier of having to install another tool and then we have
data to indicate maybe we need to rethink things or it could validate the
idea that it’s not important enough and leave things as they are.
I went and looked through all 105 pages of pip’s issues (open and closed) and made several searches using any keyword I could think of looking for any issue where someone asked for this. The only times I can find anyone asking for this were you and Ralf Gommers as part of the extended discussion around this set of PEPs and I’ve not been able to find a single other person asking for it or complaining about it.
That's because until now, the message that everyone has received over and over is that the way you install a package from a directory on disk is:
cd directory python setup.py install
and this does incremental builds. (My experience is that even today, most people are surprised to learn that 'pip install' accepts directory paths.)
In our glorious PEP 517 future, we have to teach everyone to stop using 'setup.py install' and instead use 'pip install .'. This switch enables a glorious new future of non-distutils-based build systems and fixes a bunch of other brokenness at the same time, hooray, BUT currently switching to 'pip install' also causes a regression for everyone who's used to incremental builds working.
Ralf and I noticed this because we were looking at getting a head start on the glorious future by making 'pip install' mandatory for numpy and scipy. The reason no-one else has noticed is that we're among the few people that have tried using 'pip install' as their standard install-from-working-tree command. But soon there will be more.
However, what I was able to find was what appears to be the original reason pip started copying the directory to begin with, https://github.com/pypa/pip/issues/178 which was caused by the build system reusing the build directory between two different virtual environments and causing an invalid installation to happen. The ticket is old enough that I can get at specifics it because it was migrated over from bitbucket. However the fact that we used to do exactly what you want and it caused exactly one of problem I was worried about seems to suggest to me that pip is absolutely correct in keeping this behavior.
Hmm, it looks to me like that bug is saying that at the time, if you ran 'python setup.py install' inside the pip source tree, and then tried to run pip's test suite (possibly via 'setup.py test'), then it broke. I don't think this is related to the behavior of 'pip install .', and I feel like we would know if it were currently true that running 'setup.py install' twice in the same directory produced broken shebang lines. (Again, most people who install from source directories are currently using setup.py install!)
The source tree copying was originally added in:
(which is dated ~2 months before that bug you found, and if I'm reading it right tweaks a code path that previously only worked for 'pip install foo.zip' so it also works for 'pip install foo/'). AFAICT the reason it was written this way is that pip started out with the assumption that it was always going to be downloading and unpacking archives, so the logic went:
1) make a temporary directory 2) unpack the sdist into this temporary directory 3) build from this temporary directory
Then, when it came time to add support for building from directories, the structure of the logic meant that by the time pip got to step (2) and realized that it already had a source directory, it was too late -- it was already committed to using the selected temporary directory. So instead of refactoring all this code, they made the minimal change of implementing the "unpack this sdist into this directory" operation for source directories by using shutil.copytree.
I think this chain of reasoning will feel very familiar to anyone working with the modern pip source 5 years later...
It's absolutely true that there are cases where incremental builds can screw things up, especially when using distutils/setuptools. But I don't think this is why pip does things this way originally :-).
It’s not that I don’t trust the backend, it’s that I believe in putting in systems that make it harder to do the wrong thing than the right thing. As it is now building in place correctly requires the build backend to do extra work to ensure that some file that wouldn’t be included in the sdist doesn’t influence the build in some way. Given that I’m pretty sure literally every build tool in existence for Python currently fails this test, I think that is a pretty reasonable statement to say that it might continue to be a problem into the future.
Copying the files makes that harder to do (but still easier than always going through the sdist). If you want to argue that we should always go through the sdist and we shouldn’t have a copy_files hook, I’m ok with that. I’m only partially in favor of it as a performance trade off because I think it passes a high enough bar that it’s unlikely enough for mistakes to be made (and when they do, they’ll be more obvious).
What do you think of letting build backends opt-in to in-place builds?
Other unresolved issues:
My main concern is the metadata diverging between the get_wheel_metadata and the building wheel phase. The current PEP solves that in a reasonable enough way (and in a way I can assert against). My other concerns are mostly just little API niggles to make it harder to mess up.
I think this one is important to support because we do not to be able to get at the dependencies, and invoking the entire build chain to do that seems like it will be extraordinarily slow.
It's only slow in the case where (a) there's no wheel (obviously), and (b) after getting the dependencies we decide we don't want to install this sdist after all. I imagine numpy for example won't bother implementing get_wheel_metadata because we provide wheels for all the platforms we support and because we have no dependencies, so it is doubly useless AFAICT. But yeah in other cases it could matter. I'm not opposed to including it in general, just thought this might be a way to help get the minimal PEP 517 out the door.
How do you determine the name/version for
pip install . except by
running get_wheel_metadata or build_wheel or build_sdist?
Well, I was imagining that the semantics of 'pip install .' in a multi-wheel world would be to install all the generated wheels :-). But yeah, it's not really well-specified as currently written.
Possibly the simplest solution is to say that build_wheel has to return a string which names the wheel, and then in the future we could add build_wheel2 which is identical but returns a list of strings, and backwards compatibility would be:
def build_wheel2(...): return build_wheel(...)
 Donald's suggestion of silently caching intermediate files in some global cache dir is unreasonably difficult to implement in a user-friendly way – cache management is Hard, and I frankly I still don't think users will accept individual package's build systems leaving hundreds of megabytes of random gunk inside hidden directories. We could debate the details here, but basically, if this were a great idea to do by default, then surely one of cmake/autoconf/... would already do it? Also, my understanding is the main reason pip wants to copy files in the first place is to avoid accidental pollution between different builds using the same local tree; but if a build system implements a global cache like this then surprise, now you can get pollution between arbitrary builds using different trees, or between builds that don't even use a local tree at all (e.g. running 'pip install numpy==1.12.0' can potentially cause a later run of 'pip install numpy==1.12.1' to be corrupted). And, it assumes that all build systems can easily support out-of-tree incremental builds, which is often true but not guaranteed when your wheel build has to wrap some random third party C library's build system.
Make it opt-in
If it's opt-in, then I might as well tell people to run 'pip devinstall .' or 'in-place-install .' or whatever instead, and it'll be much easier all around. But instead of making it opt-in, I'd much rather it Just Work. It's frustrating that at the same time we're moving to the glorious simplified future, we're also picking up a new piece of arcane wisdom that devs will need to be taught, and another place where numerical Python devs will roll their eyes at how the standard Python tooling doesn't care about them. (And I totally understand that the motivation on your end is also to make things Just Work, but I feel like in the specific case where someone is repeatedly building out of the same source directory – which is the one in dispute here – we should optimize for developer experience.)
and build a hash of the directory into the cache key so different file contents mean different cache objects then. I’m not really sold on the idea that the fact some developers haven’t decided to do it then it is a bad idea. Perhaps those build systems are operating under different constraints than we are (I’m almost certainly sure this is the case).
I think the way the constraints differ is just that they don't have this imposed constraint that they must build out of an ephemeral tempdir. If we're operating under that constraint, then your idea is certainly worth considering. My point is that it's much easier to remove that constraint (by switching to using a non-pip tool) than it is to work around it, so that's my prediction for what will happen.
-- Nathaniel J. Smith -- https://vorpus.org