[Distutils] PyPI is a sick sick hoarder

Daniel Holth dholth at gmail.com
Sat May 16 17:40:58 CEST 2015


On May 16, 2015 11:22 AM, "David Cournapeau" <cournape at gmail.com> wrote:
>
>
>
> On Sat, May 16, 2015 at 11:36 PM, Justin Cappos <jcappos at nyu.edu> wrote:
>>>
>>> I am no expert, but I don't understand why backtracking algorithms
would to be faster than SAT, since they both potentially need to walk over
the full set of possible solutions. It is hard to reason about the cost
because the worst case is in theory growing exponentially in both cases.
>>
>>
>> This is talked about a bit in this thread:
https://github.com/pypa/pip/issues/988
>>
>> Each algorithm could be computationally more efficient.  Basically, *if
there are no conflicts* backtracking will certainly win.  If there are a
huge number of conflicts a SAT solver will certainly win.  It's not clear
where the tipping point is between the two schemes.
>>
>> However, a better question is does the computational difference matter?
If one is a microsecond faster than the other, I don't think anyone cares.
However, from the OPIUM paper (listed off of that thread), it is clear that
SAT solver resolution can be slow without optimizations to make them work
more like backtracking resolvers.  From my experience backtracking
resolvers are also slow when the conflict rate is high.
>
>
> Pure SAT is fast enough in practice in my experience (concretely: solving
thousand of rules takes < 1 sec). It becomes more complicated as you need
to optimize the solution, especially when you have already installed
packages. This is unfortunately not as well discussed in the literature.
Pseudo-boolean SAT for optimization was argued to be too slow by the 0
install people, but OTOH, this seems to be what's used in conda, so who
knows :)

Where optimizing means something like "find a solution with the newest
possible releases of the required packages", not execution speed.

> If you SAT solver is in pure python, you can choose a "direction" of the
search which is more meaningful. I believe this is what 0install does from
reading http://0install.net/solver.html, and what we have in our own SAT
solver code. I unfortunately cannot look at the 0install code myself as it
is under the GPL and am working on a BSD solver implementation. I also do
not know how they handle updates and already installed packages.
>
>>
>> This only considers computation cost though.  Other factors can become
more expensive than computation.  For example, SAT solvers need all the
rules to consider.  So a SAT solution needs to effectively download the
full dependency graph before starting.  A backtracking dependency resolver
can just download packages or dependency information as it considers them.
The bandwidth cost for SAT solvers should be higher.
>
>
> With a reasonable representation, I think you can make it small enough.
To give an idea, our index @ Enthought containing around 20k packages takes
~340 kb compressed w/ bz2 if you only keep the data required for dependency
handling (name, version and runtime dependencies), and that's using json,
an inefficient encoding, so I suspect encoding all of pypi may be a few MB
only fetch, which is generally faster that doing tens of http requests.
>
> The libsvolv people worked on a binary representation that may also be
worth looking at.
>
>>
>> P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy
to talk more with you and/or Robert about minutiae that others may not care
about.
>
>
> Sure, I would be happy too. As I mentioned before, we have some code
around a SAT-based solver, but it is not ready yet, which is why we kept it
private (https://github.com/enthought/sat-solver). It handles well (== both
speed and quality-wise) the case where nothing is installed, but behaves
poorly when packages are already installed, and does not handle the update
case yet. The code is also very prototype-ish, but is not too complicated
to experimente with it.
>
> David
>
>
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150516/0acffbaf/attachment-0001.html>


More information about the Distutils-SIG mailing list