[Distutils] Handling the binary dependency management problem

Nick Coghlan ncoghlan at gmail.com
Mon Dec 2 14:22:21 CET 2013


On 2 Dec 2013 21:57, "Paul Moore" <p.f.moore at gmail.com> wrote:
>
> On 2 December 2013 10:45, Oscar Benjamin <oscar.j.benjamin at gmail.com>
wrote:
> > Nick's proposal is basically incompatible with allowing Cristoph
> > Gohlke to use pip and wheels. Christoph provides a bewildering array
> > of installers for prebuilt packages that are interchangeable with
> > other builds at the level of Python code but not necessarily at the
> > binary level. So, for example, His scipy is incompatible with the
> > "official" (from SourceForge) Windows numpy build because it links
> > with the non-free Intel MKL library and it needs numpy to link against
> > the same. Installing his scipy over the other numpy results in this:
> >
https://mail.python.org/pipermail//python-list/2013-September/655669.html
>
> Ah, OK. I had not seen this issue as I've always either used
> Christoph's builds or not used them. I've never tried or needed to mix
> builds. This is probably because I'm very much only a casual user of
> the scientific stack, so my needs are pretty simple.
>
> > So Christoph can provide wheels and people can manually download them
> > and install from them but would beginners find that any easier than
> > running the .exe installers? The .exe installers are more powerful and
> > can do things like the numpy super-pack that distributes binaries for
> > different levels of SSE support (as discussed previously on this list
> > the wheel format cannot currently achieve this). Beginners will also
> > find .exe installers more intuitive than running pip on the command
> > line and will typically get better error messages etc. than pip
> > provides. So I don't really see why Cristoph should bother switching
> > formats (as noted by Paul before anyone who wants a wheel cache can
> > easily convert his installers into wheels).
>
> The crucial answer here is that exe installers don't recognise
> virtualenvs. Again, I can imagine that a scientific user would
> naturally install Python and put all the scientific modules into the
> system Python - but precisely because I'm a casual user, I want to
> keep big dependencies like numpy/scipy out of my system Python, and so
> I use virtualenvs.
>
> The big improvement pip/wheel give over wininst is a consistent user
> experience, whether installing into the system Python, a virtualenv,
> or a Python 3.3+ venv. (I used to use wininsts in preference to pip,
> so please excuse a certain level of the enthusiasm of a convert here
> :-))

And the conda folks are working on playing nice with virtualenv - I don't
we'll see a similar offer from Microsoft for MSI any time soon :)

> > AFAICT what Nick is saying is that it's not possible for pip and PyPI
> > to guarantee the compatibility of different binaries because unlike
> > apt-get and friends only part of the software stack is controlled.
> > However I think this is not the most relevant difference between pip
> > and apt-get here. The crucial difference is that apt-get communicates
> > with repositories where all code and all binaries are under control of
> > a single organisation. Pip (when used normally) communicates with PyPI
> > and no single organisation controls the content of PyPI. So there's no
> > way for pip/PyPI to guarantee *anything* about the compatibility of
> > the code that they distribute/install, whether the problems are to do
> > with binary compatibility or just compatibility of pure Python code.
> > For pure Python distributions package authors are expected to solve
> > the compatibility problems and pip provides version specifiers etc
> > that they can use to do this. For built distributions they could do
> > the same - except that pip/PyPI don't provide a mechanism for them to
> > do so.
>
> Agreed. Expecting the same level of compatibility guarantees from PyPI
> as is provided by RPM/apt is unrealistic, in my view. Heck, even pure
> Python packages don't give any indication as to whether they are
> Python 3 compatible in some cases (I just hit this today with the
> binstar package, as an example). This is a fact of life with a
> repository that doesn't QA uploads.

Exactly, this is the difference between pip and conda - conda is a solution
for installing from curated *collections* of packages. It's somewhat
related to the tagging system people are speculating about for PyPI, but
instead of being purely hypothetical, it already exists.

Because it uses hash based dependencies, there's no chance of things
getting mixed up. That design has other problems which limit the niche
where a tool like conda is the right answer, but within that niche, hash
based dependency management helps bring the combinatorial explosion of
possible variations under control.

> > Because PyPI is not a centrally controlled single software stack it
> > needs a different model for ensuring compatibility - one driven by the
> > community. People in the Python community are prepared to spend a
> > considerable amount of time, effort and other resources solving this
> > problem. Consider how much time Cristoph Gohlke must spend maintaining
> > such a large internally consistent set of built packages. He has
> > created a single compatible binary software stack for scientific
> > computation. It's just that PyPI doesn't give him any way to
> > distribute it. If perhaps he could own a tag like "cgohlke" and upload
> > numpy:cgohlke and scipy:cgohlke then his scipy:cgohlke wheel could
> > depend on numpy:cgohlke and numpy:cgohlke could somehow communicate
> > the fact that it is incompatible with any other scipy distribution.
> > This is one way in which pip/PyPI could facilitate the Python
> > community to solve the binary compatibility problems.
>
> Exactly.
>
> > [As an aside I don't know whether Cristoph's Intel license would
> > permit distribution via PYPI.]
>
> Yes, I'd expect Cristoph's packages would likely always have to remain
> off PyPI (if for no other reason than the fact that he isn't the owner
> of the packages he's providing distributions for). But if he did
> provide a PyPI style index, compatibility could be handled manually
> via `pip install --index-url <Cristoph's repo>`. And even if they are
> not on PyPI, your custom tag suggestion above would still be
> beneficial.

Yes, I think a "variant" tag would be a useful feature for wheels as well.
It's not a substitute for hash identified fully curated stacks, though.

>
> > Another way would be to allow the community to create compatibility
> > tags so that projects like numpy would have mechanisms to indicate
> > e.g. Fortran ABI compatibility. In this model no one owns a particular
> > tag but projects that depend on one another could simply use them in a
> > consistent way that pip could understand.
> >
> > The impression I got from Nick's initial post is that, having
> > discovered that the compatibility tags used in the wheel format are
> > insufficient for the needs of the Python community and that it's not
> > possible to enumerate the tags needed, pip/PyPI should just give up on
> > the problem of binary compatibility. I think it would be better to
> > think about simple mechanisms that the authors of the concerned
> > packages could use so that people in the Python community can solve
> > these problems for each of the packages they contribute to. There is
> > enough will out there to make this work for all the big packages and
> > problematic operating systems if only PyPI will allow it.
>
> Agreed - completely giving up on the issue in favour of a separately
> curated solution seems wrong to me, at least in the sense that it
> abandons people who don't fit cleanly into either solution (e.g.,
> someone who needs to use virtualenv rather than conda environments,
> but needs numpy, and a package that Christoph distributes, but conda
> doesn't - who should that person turn to for help?). I don't see any
> problem with admitting we can't solve every aspect of the problem
> automatically, but that doesn't preclude providing extensibility to
> allow communities to solve the parts of the issue that impact their
> own area.

We already have more than enough work to do in the packaging space - why
come up with a new solution for publication of curated stacks when the
conda folks already have one, and it works cross-platform for the stack
with the most complex external dependencies?

If static linking or bundling is an option, then wheels can already handle
it. If targeting a particular controlled environment (or a user base
prepared to get the appropriate bits in place themselves), wheels can also
handle shared external binary dependencies.

What they can't easily do is share a fully curated stack of software,
including external binary dependencies, in a way that works across
platforms and isn't trivially easy to screw up by accidentally installing
from the wrong place.

Just as a while ago I started thinking it made more sense to rehabilitate
setuptools than it did to try to replace it any time soon, and we mostly
let the TUF folks run with the end-to-end security problem, I now think it
makes sense to start working on collaborating with the conda folks on the
"distribution of curated software stacks" problem, as that reduces the
pressure on the core tools to address that concern.

PyPI wheels would then be about publishing "default" versions of
components, with the broadest compatibility, while conda would be a
solution for getting access to alternate builds that may be faster, but
require external shared dependencies.

> As a quick sanity check question - what is the long-term advice for
> Christoph (and others like him)? Continue distributing wininst
> installers? Move to wheels? Move to conda packages? Do whatever you
> want, we don't care? We're supposedly pushing pip as "the officially
> supported solution to package management" - how can that be reconciled
> with *not* advising builders[1] to produce pip-compatible packages?

What Christoph is doing is producing a cross-platform curated binary
software stack, including external dependencies. That's precisely the
problem I'm suggesting we *not* try to solve in the core tools any time
soon, but instead support bootstrapping conda to solve the problem at a
different layer.

So the pip compatible builds for those tools would likely miss out on some
of the external acceleration features, while curated stacks built with
other options could be made available through conda (using hash based
dependencies to ensure consistency).

Revising the wheel spec to include variant tags, or improve the way
platforms are identified is a long way down the todo list, and that's
without allowing the subsequent time needed update pip and other tools.

By ceding the "distribution of cross-platform curated software stacks with
external binary dependencies" problem to conda, users would get a solution
to that problem that they can use *now*, rather than in some indefinite
future after the metadata 2.0 and end-to-end security changes in the core
tools have been resolved.

Now, it may be we take a closer look at conda and decide there are critical
issues that need to be addressed before it can be recommended rather than
just mentioned (e.g. I don't know off the top of my head if the server
comms is appropriately secured). This thread was mostly about pointing that
there's a thorny subset of the cross-platform software distribution problem
that I think we can offload to someone else and have something that mostly
works *today*, rather than only getting to it in some speculative future
after a bunch of additional currently hand-wavey design work has been
completed.

Cheers,
Nick.

>
> Paul
>
> [1] At least, those who are not specifically affiliated with a group
> offering a curated solution (like conda or ActiveState).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20131202/73cbf6d9/attachment-0001.html>


More information about the Distutils-SIG mailing list