[Distutils] PyPI is a sick sick hoarder

Justin Cappos jcappos at nyu.edu
Fri May 15 21:18:10 CEST 2015


One thing to consider is that if conflicts do not exist (or are very rare),
the number of possible combinations is a moot point.  A greedy algorithm
for installation (which just chooses the most favored package to resolve
each dependency) will run in linear time with the number of packages it
would install, if no conflicts exist.

So, what you are saying about state exploration may be true for a resolver
that uses something like a SAT solver, but doesn't apply to backtracking
dependency resolution (unless a huge number of conflicts occur) or simple
dependency resolution (at all).  SAT solvers do have heuristics to avoid
this blow up, except in pathological cases.  However, simple / backtracking
dependency resolution systems have the further advantage of not needing to
request unneeded metadata in the first place...

Thanks,
Justin

On Fri, May 15, 2015 at 2:57 PM, Robert Collins <robertc at robertcollins.net>
wrote:

> So, I am working on pip issue 988: pip doesn't resolve packages at all.
>
> This is O(packages^alternatives_per_package): if you are resolving 10
> packages with 10 versions each, there are approximately 10^10 or 10G
> combinations. 10 packages with 100 versions each - 10^100.
>
> So - its going to depend pretty heavily on some good heuristics in
> whatever final algorithm makes its way in, but the problem is
> exacerbated by PyPI's nature.
>
> Most Linux (all that i'm aware of) distributions have at most 5
> versions of a package to consider at any time - installed(might be
> None), current release, current release security updates, new release
> being upgraded to, new release being upgraded to's security updates.
> And their common worst case is actually 2 versions: installed==current
> release and one new release present. They map alternatives out into
> separate packages (e.g. when an older soname is deliberately kept
> across an ABI incompatibility, you end up with 2 packages, not 2
> versions of one package). To when comparing pip's challenge to apt's:
> apt has ~20-30K packages, with altnernatives ~= 2, or
> pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)
>
> Scaling the number of packages is relatively easy; scaling the number
> of alternatives is harder. Even 300 packages (the dependency tree for
> openstack) is ~2.4T combinations to probe.
>
> I wonder if it makes sense to give some back-pressure to people, or at
> the very least encourage them to remove distributions that:
>  - they don't support anymore
>  - have security holes
>
> If folk consider PyPI a sort of historical archive then perhaps we
> could have a feature to select 'supported' versions by the author, and
> allow a query parameter to ask for all the versions.
>
> -Rob
>
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150515/447fd60f/attachment.html>


More information about the Distutils-SIG mailing list