[Distutils] PyPI is a sick sick hoarder

Robert Collins robertc at robertcollins.net
Fri May 15 20:57:22 CEST 2015


So, I am working on pip issue 988: pip doesn't resolve packages at all.

This is O(packages^alternatives_per_package): if you are resolving 10
packages with 10 versions each, there are approximately 10^10 or 10G
combinations. 10 packages with 100 versions each - 10^100.

So - its going to depend pretty heavily on some good heuristics in
whatever final algorithm makes its way in, but the problem is
exacerbated by PyPI's nature.

Most Linux (all that i'm aware of) distributions have at most 5
versions of a package to consider at any time - installed(might be
None), current release, current release security updates, new release
being upgraded to, new release being upgraded to's security updates.
And their common worst case is actually 2 versions: installed==current
release and one new release present. They map alternatives out into
separate packages (e.g. when an older soname is deliberately kept
across an ABI incompatibility, you end up with 2 packages, not 2
versions of one package). To when comparing pip's challenge to apt's:
apt has ~20-30K packages, with altnernatives ~= 2, or
pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)

Scaling the number of packages is relatively easy; scaling the number
of alternatives is harder. Even 300 packages (the dependency tree for
openstack) is ~2.4T combinations to probe.

I wonder if it makes sense to give some back-pressure to people, or at
the very least encourage them to remove distributions that:
 - they don't support anymore
 - have security holes

If folk consider PyPI a sort of historical archive then perhaps we
could have a feature to select 'supported' versions by the author, and
allow a query parameter to ask for all the versions.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud


More information about the Distutils-SIG mailing list