
On Sun, Oct 11, 2015 at 4:49 AM, Paul Moore <p.f.moore@gmail.com> wrote: [...]
As regards what pip could do, technically you are of course correct (it's possible, it just needs someone willing to make the code changes). I don't know, however, if the approach you're proposing fits with how we currently envisage pip developing. Specifically, my personal goal [1] is that we get to a point where pip can do all of the dependency resolution steps, getting to a point where it knows exactly what it will install, *before* it starts downloading packages, running build steps, etc.
Thanks for stating this so clearly. Unfortunately I just don't see any way this can possibly be achieved. It seems to unambiguously rule out the various "compromise" proposals from the other thread (e.g., your suggestion that packages would have to specify most dependencies statically, but would have the option of adding some extra ones at build time, would not accomplish the goal stated above). To accomplish this, it really is necessary that we be able to *exactly* predict the full dependency set (and also environment specifiers and potential external system requirements and ... -- anything that possibly affects whether a wheel is installable) before we download any packages or build any wheels. But as discussed at length in the other thread, it's a fact of life that the same source release may be configured in different ways that create different resulting dependencies. NumPy is one example of this, but it's hardly unusual -- pretty much any optional dependency on a C library works like this. And these kinds of issues will presumably only get more complicated if we start adding more complex dependency structures (e.g. "provides", per-package abi compatibility tracking, ...). Are you aware of any other systems that have accomplished anything like this? From a quick skim, it looks like .deb, .rpm, gentoo, freebsd ports, and ruby gems all allow for arbitrary code execution inside "source" packages when specifying dependencies, and none of the systems I looked at have the property that you're looking for. This doesn't prove that it's impossible, but... I do see one clear path to accomplish what you want: 1) enable linux wheels on pypi 2) build an autobuilder infrastructure for pypi 3) now that 95% of packages have wheels, flip the switch so that pip ignores sdists when auto-installing dependencies This strategy at least has the advantage that it only requires we do things that have been done before and we know are possible :-). And the alternative is -- what? As far as pip goes, it sounds like we all agree that there's a perfectly sound algorithm for solving the installation problem without access to static dependencies (i.e., wrapping the solve-then-build cycle in a loop); it's just that this leads to a bit more complicated code in pip. But pip clearly has to implement and debug this code no matter what, because we are committed to handling traditional sdists for some years yet. It seems like the best we can hope for is that if we impose these constraints on wheels, and we somehow manage to make this work at all (which seems to be a research problem), then we might eventually be able to replace some working code with some slightly simpler working code. (Plus whatever other benefits there are of having static dependency metadata on pypi, like interesting global static analyses of the package ecosystem.) I know "debian packagers think source wheels with static metadata would be great" was cited in the other thread as an example of an advantage, but note that for numpy, the dependencies in the configuration that debian will want to use are exactly *not* the dependencies in the configuration we'll put on pypi, because the pypi version wants to be self-contained inside the pypi ecosystem but the debian version wants to rely as much as possible on debian-distributed libraries. So even if numpy has a "source wheel" that has static dependency metadata, then this will be exactly what debian *can't* use; they'll need a traditional non-static-metadata source release instead. I dunno -- I'm sure there must exist some other ways forward that don't require dropping the dream of static dependencies. At one extreme, we had a birds-of-a-feature at SciPy this year on "the future of numpy", and the most vocal audience contingent was in favor of numpy simply dropping upstream support for pip/wheels/pypi entirely and requiring all downstream packages/users to switch to conda or building by hand. It sounds like a terrible idea to me. But I would find it easier to believe in the pypi/pip ecosystem if there were some concrete plan for how this all-static world was actually going to work, and that it wasn't just chasing rainbows. -n -- Nathaniel J. Smith -- http://vorpus.org