[Distutils] What metadata does pip actually need about sdists?

Nathaniel Smith njs at pobox.com
Sun Oct 11 06:31:27 CEST 2015


Hi all,

I'm finding it impossible to keep track of that other thread, and I
guess I'm probably not the only one, so I figured I'd try splitting a
few of the more specific discussions out :-).

One thing that seems to be a key issue, but where I remain very
confused, is the question of what pip actually needs from an sdist.
(Not PyPI, just pip or other package install tools.)

Right now, IIUC, there are three times that pip install touches
sdist-related metadata:
1) it uses the name+version that are embedded in the sdist filename to
select an sdist from an index like PyPI
2) after unpacking this sdist it then calls 'setup.py egg_info' to get
the full metadata for the wheel (or wheel equivalent) that this sdist
will eventually produce. Specifically what it does with this is
extract the setup_requires and install_requires fields, and uses them
to go find other wheels/sdists that also need to be installed
3) eventually it actually builds the package, and this produces a
wheel (or wheel equivalent) that has its own metadata (which often
matches the metadata from egg_info in step (2), but not always)

Is that a correct description of current behavior? Is there anything
that pip ever looks at besides name, version, dependencies?

Paul says that this is broken, and that pip gets lots of bug reports
that "can be traced back to needing to run setup.py egg-info to get
metadata" [1]. Since AFAICT the only metadata that pip actually
touches is name, version, and dependencies, and it already knows the
name and version before it runs egg_info, I assume that what this
means is that it's crucial for pip to have static access to dependency
information? OTOH in another email Paul says that name and version are
the minimum he wants [2], so maybe I'm reading too much into this :-).

>From the discussion so far, it sounds like the particularly crucial
question is whether pip needs to statically know dependencies before
building a wheel.

Trying to reason through from first principles, I can't see any reason
why it would.

It would be somewhat convenient if sdists did list their binary
dependencies: if that were the case, then pip could take a strictly
phased approach:

1) solve the complete dependency graph to find a set of packages to
install / remove
2) for all the packages-to-be-installed that are sdists, turn them into wheels
3) install all the wheels

OTOH if sdists have only name and version statically, but not
dependency information, then you need to do something like:

1) create a fake dependency graph that contains accurate information
for all known wheels, and for each sdist add a fake node that has the
right name and version number but pretends not to have any
dependencies.
2) solve this graph to find the set of packages to install / remove
3) if any of the packages-to-be-installed are sdists, then fetch them,
run egg_info or build them or whatever to get their real dependencies,
add these to the graph, and go to step 1
4) else, we have wheels for everything; install them.

(This works because dependencies are constraints -- adding
dependencies can only reduce the space of possible solutions, never
enlarge it. Also, because by the time we decide to fetch and build any
sdists, we already know that we're very likely to want to install
them, so the performance penalty for building packages we turn out not
to want is not high. And, crucially, we know that there exists some
set of dependency metadata which would convince us to install these
sdists, and dependency metadata is under the package author's control,
so we already have established a trust route to the author of this
package -- if they don't declare any dependencies, then we'll be
installing and running arbitrary code of theirs, so running arbitrary
code to check their dependencies doesn't require any additional
trust.)

But there's often a large difference between what we work out from
first principles and how things actually work :-). Is there anything
I'm missing in the analysis above? Do the relevant pip maintainers
even read this mailing list? :-)

-n

[1] https://mail.python.org/pipermail/distutils-sig/2015-October/026960.html
[2] https://mail.python.org/pipermail/distutils-sig/2015-October/026942.html

-- 
Nathaniel J. Smith -- http://vorpus.org


More information about the Distutils-SIG mailing list