[Distutils] PyPI is a sick sick hoarder

David Cournapeau cournape at gmail.com
Sat May 16 07:00:21 CEST 2015


On Sat, May 16, 2015 at 10:52 AM, Robert Collins <robertc at robertcollins.net>
wrote:

> On 16 May 2015 at 13:45, Donald Stufft <donald at stufft.io> wrote:
> >
> >> On May 15, 2015, at 9:22 PM, Robert Collins <robertc at robertcollins.net>
> wrote:
> >>
> >> On 16 May 2015 at 11:08, Marcus Smith <qwcode at gmail.com> wrote:
> >>> Why not start with pip at least being a "simple" fail-on-conflict
> resolver
> >>> (vs the "1st found wins" resolver it is now)...
> >>>
> >>> You'd "backtrack" for the sake of re-walking when new constraints are
> found,
> >>> but not for the purpose of solving conflicts.
> >>>
> >>> I know you're motivated to solve Openstack build issues, but many of
> the
> >>> issues I've seen in the pip tracker, I think would be solved without
> the
> >>> backtracking resolver you're trying to build.
> >>
> >> Well, I'm scratching the itch I have. If its too hard to get something
> >> decent, sure I might back off in my goals, but I see no point aiming
> >> for something less than all the other language specific packaging
> >> systems out there have.
> >
> >
> > So what makes the other language specific packaging systems different?
> As far
> > as I know all of them have complete archives (e.g. they are like PyPI
> where they
> > have a lot of versions, not like Linux Distros). What can we learn from
> how they
> > solved this?
>
> NB; I have by no means finished low hanging heuristics and space
> trimming stuff :). I have some simple things in mind and am sure I'll
> end up with something 'good enough' for day to day use. The thing I'm
> worried about is the long term health of the approach.
>
> Good questions. Some of it is structural I suspect. A quick rundown.
> cabal (haskell) has a backtracking solver that accepts various
> parameters to tell it to try harder.
> javascript effectively vendors every dep ever, so you end up with many
> copies of the same library at different versions in the same process.
> rust's cargo system currently solves everything in a single project
> only - it has no binary packaging, only vendor-into-a-binary-build
> packaging.
> The gem behaviour I'm not yet familiar with.
> perl I used to know but time has eroded it :/.
>

FWIW, php uses a SAT-based solver in composer, which started as a port of
libsolv (the SAT solver used by openSUSE and soon Fedora).

I am no expert, but I don't understand why backtracking algorithms would to
be faster than SAT, since they both potentially need to walk over the full
set of possible solutions. It is hard to reason about the cost because the
worst case is in theory growing exponentially in both cases.

With a SAT-based algorithm for dependency resolution, it is relatively
simple to apply heuristics which massively prune the search space. For
example, when considering package A with say 10 potential versions A_1,
etc..., in theory, you need to generate the rules:

# - means not install, + means install
- A_1 | -  A_2
- A_1 | - A_3
...

and those constitute most of the rules in common cases. But it is possible
to tweak the SAT implementation to replace those rules by a single "AtMost
one of" rule per *package*, which means the #rules do not grow much by
versions.

The real difficulty of SAT-based solver is the optimization part: many
actually valid solutions are not acceptable, and that's where the
heuristics get more complicated.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150516/6196b9c4/attachment-0001.html>


More information about the Distutils-SIG mailing list