[Distutils] Handling the binary dependency management problem

Paul Moore p.f.moore at gmail.com
Mon Dec 2 12:57:13 CET 2013


On 2 December 2013 10:45, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
> Nick's proposal is basically incompatible with allowing Cristoph
> Gohlke to use pip and wheels. Christoph provides a bewildering array
> of installers for prebuilt packages that are interchangeable with
> other builds at the level of Python code but not necessarily at the
> binary level. So, for example, His scipy is incompatible with the
> "official" (from SourceForge) Windows numpy build because it links
> with the non-free Intel MKL library and it needs numpy to link against
> the same. Installing his scipy over the other numpy results in this:
> https://mail.python.org/pipermail//python-list/2013-September/655669.html

Ah, OK. I had not seen this issue as I've always either used
Christoph's builds or not used them. I've never tried or needed to mix
builds. This is probably because I'm very much only a casual user of
the scientific stack, so my needs are pretty simple.

> So Christoph can provide wheels and people can manually download them
> and install from them but would beginners find that any easier than
> running the .exe installers? The .exe installers are more powerful and
> can do things like the numpy super-pack that distributes binaries for
> different levels of SSE support (as discussed previously on this list
> the wheel format cannot currently achieve this). Beginners will also
> find .exe installers more intuitive than running pip on the command
> line and will typically get better error messages etc. than pip
> provides. So I don't really see why Cristoph should bother switching
> formats (as noted by Paul before anyone who wants a wheel cache can
> easily convert his installers into wheels).

The crucial answer here is that exe installers don't recognise
virtualenvs. Again, I can imagine that a scientific user would
naturally install Python and put all the scientific modules into the
system Python - but precisely because I'm a casual user, I want to
keep big dependencies like numpy/scipy out of my system Python, and so
I use virtualenvs.

The big improvement pip/wheel give over wininst is a consistent user
experience, whether installing into the system Python, a virtualenv,
or a Python 3.3+ venv. (I used to use wininsts in preference to pip,
so please excuse a certain level of the enthusiasm of a convert here
:-))

> AFAICT what Nick is saying is that it's not possible for pip and PyPI
> to guarantee the compatibility of different binaries because unlike
> apt-get and friends only part of the software stack is controlled.
> However I think this is not the most relevant difference between pip
> and apt-get here. The crucial difference is that apt-get communicates
> with repositories where all code and all binaries are under control of
> a single organisation. Pip (when used normally) communicates with PyPI
> and no single organisation controls the content of PyPI. So there's no
> way for pip/PyPI to guarantee *anything* about the compatibility of
> the code that they distribute/install, whether the problems are to do
> with binary compatibility or just compatibility of pure Python code.
> For pure Python distributions package authors are expected to solve
> the compatibility problems and pip provides version specifiers etc
> that they can use to do this. For built distributions they could do
> the same - except that pip/PyPI don't provide a mechanism for them to
> do so.

Agreed. Expecting the same level of compatibility guarantees from PyPI
as is provided by RPM/apt is unrealistic, in my view. Heck, even pure
Python packages don't give any indication as to whether they are
Python 3 compatible in some cases (I just hit this today with the
binstar package, as an example). This is a fact of life with a
repository that doesn't QA uploads.

> Because PyPI is not a centrally controlled single software stack it
> needs a different model for ensuring compatibility - one driven by the
> community. People in the Python community are prepared to spend a
> considerable amount of time, effort and other resources solving this
> problem. Consider how much time Cristoph Gohlke must spend maintaining
> such a large internally consistent set of built packages. He has
> created a single compatible binary software stack for scientific
> computation. It's just that PyPI doesn't give him any way to
> distribute it. If perhaps he could own a tag like "cgohlke" and upload
> numpy:cgohlke and scipy:cgohlke then his scipy:cgohlke wheel could
> depend on numpy:cgohlke and numpy:cgohlke could somehow communicate
> the fact that it is incompatible with any other scipy distribution.
> This is one way in which pip/PyPI could facilitate the Python
> community to solve the binary compatibility problems.

Exactly.

> [As an aside I don't know whether Cristoph's Intel license would
> permit distribution via PYPI.]

Yes, I'd expect Cristoph's packages would likely always have to remain
off PyPI (if for no other reason than the fact that he isn't the owner
of the packages he's providing distributions for). But if he did
provide a PyPI style index, compatibility could be handled manually
via `pip install --index-url <Cristoph's repo>`. And even if they are
not on PyPI, your custom tag suggestion above would still be
beneficial.

> Another way would be to allow the community to create compatibility
> tags so that projects like numpy would have mechanisms to indicate
> e.g. Fortran ABI compatibility. In this model no one owns a particular
> tag but projects that depend on one another could simply use them in a
> consistent way that pip could understand.
>
> The impression I got from Nick's initial post is that, having
> discovered that the compatibility tags used in the wheel format are
> insufficient for the needs of the Python community and that it's not
> possible to enumerate the tags needed, pip/PyPI should just give up on
> the problem of binary compatibility. I think it would be better to
> think about simple mechanisms that the authors of the concerned
> packages could use so that people in the Python community can solve
> these problems for each of the packages they contribute to. There is
> enough will out there to make this work for all the big packages and
> problematic operating systems if only PyPI will allow it.

Agreed - completely giving up on the issue in favour of a separately
curated solution seems wrong to me, at least in the sense that it
abandons people who don't fit cleanly into either solution (e.g.,
someone who needs to use virtualenv rather than conda environments,
but needs numpy, and a package that Christoph distributes, but conda
doesn't - who should that person turn to for help?). I don't see any
problem with admitting we can't solve every aspect of the problem
automatically, but that doesn't preclude providing extensibility to
allow communities to solve the parts of the issue that impact their
own area.

As a quick sanity check question - what is the long-term advice for
Christoph (and others like him)? Continue distributing wininst
installers? Move to wheels? Move to conda packages? Do whatever you
want, we don't care? We're supposedly pushing pip as "the officially
supported solution to package management" - how can that be reconciled
with *not* advising builders[1] to produce pip-compatible packages?

Paul

[1] At least, those who are not specifically affiliated with a group
offering a curated solution (like conda or ActiveState).


More information about the Distutils-SIG mailing list