[Distutils] What metadata does pip actually need about sdists?

Nathaniel Smith njs at pobox.com
Mon Oct 12 06:53:36 CEST 2015


On Sun, Oct 11, 2015 at 4:49 AM, Paul Moore <p.f.moore at gmail.com> wrote:
[...]
> As regards what pip could do, technically you are of course correct
> (it's possible, it just needs someone willing to make the code
> changes). I don't know, however, if the approach you're proposing fits
> with how we currently envisage pip developing. Specifically, my
> personal goal [1] is that we get to a point where pip can do all of
> the dependency resolution steps, getting to a point where it knows
> exactly what it will install, *before* it starts downloading packages,
> running build steps, etc.

Thanks for stating this so clearly.

Unfortunately I just don't see any way this can possibly be achieved.
It seems to unambiguously rule out the various "compromise" proposals
from the other thread (e.g., your suggestion that packages would have
to specify most dependencies statically, but would have the option of
adding some extra ones at build time, would not accomplish the goal
stated above). To accomplish this, it really is necessary that we be
able to *exactly* predict the full dependency set (and also
environment specifiers and potential external system requirements and
... -- anything that possibly affects whether a wheel is installable)
before we download any packages or build any wheels. But as discussed
at length in the other thread, it's a fact of life that the same
source release may be configured in different ways that create
different resulting dependencies. NumPy is one example of this, but
it's hardly unusual -- pretty much any optional dependency on a C
library works like this. And these kinds of issues will presumably
only get more complicated if we start adding more complex dependency
structures (e.g. "provides", per-package abi compatibility tracking,
...).

Are you aware of any other systems that have accomplished anything
like this? From a quick skim, it looks like .deb, .rpm, gentoo,
freebsd ports, and ruby gems all allow for arbitrary code execution
inside "source" packages when specifying dependencies, and none of the
systems I looked at have the property that you're looking for. This
doesn't prove that it's impossible, but...

I do see one clear path to accomplish what you want:

1) enable linux wheels on pypi
2) build an autobuilder infrastructure for pypi
3) now that 95% of packages have wheels, flip the switch so that pip
ignores sdists when auto-installing dependencies

This strategy at least has the advantage that it only requires we do
things that have been done before and we know are possible :-).

And the alternative is -- what? As far as pip goes, it sounds like we
all agree that there's a perfectly sound algorithm for solving the
installation problem without access to static dependencies (i.e.,
wrapping the solve-then-build cycle in a loop); it's just that this
leads to a bit more complicated code in pip. But pip clearly has to
implement and debug this code no matter what, because we are committed
to handling traditional sdists for some years yet. It seems like the
best we can hope for is that if we impose these constraints on wheels,
and we somehow manage to make this work at all (which seems to be a
research problem), then we might eventually be able to replace some
working code with some slightly simpler working code. (Plus whatever
other benefits there are of having static dependency metadata on pypi,
like interesting global static analyses of the package ecosystem.)

I know "debian packagers think source wheels with static metadata
would be great" was cited in the other thread as an example of an
advantage, but note that for numpy, the dependencies in the
configuration that debian will want to use are exactly *not* the
dependencies in the configuration we'll put on pypi, because the pypi
version wants to be self-contained inside the pypi ecosystem but the
debian version wants to rely as much as possible on debian-distributed
libraries. So even if numpy has a "source wheel" that has static
dependency metadata, then this will be exactly what debian *can't*
use; they'll need a traditional non-static-metadata source release
instead.

I dunno -- I'm sure there must exist some other ways forward that
don't require dropping the dream of static dependencies. At one
extreme, we had a birds-of-a-feature at SciPy this year on "the future
of numpy", and the most vocal audience contingent was in favor of
numpy simply dropping upstream support for pip/wheels/pypi entirely
and requiring all downstream packages/users to switch to conda or
building by hand. It sounds like a terrible idea to me. But I would
find it easier to believe in the pypi/pip ecosystem if there were some
concrete plan for how this all-static world was actually going to
work, and that it wasn't just chasing rainbows.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org


More information about the Distutils-SIG mailing list