[Distutils] Handling the binary dependency management problem

Oscar Benjamin oscar.j.benjamin at gmail.com
Mon Dec 2 11:45:30 CET 2013


On 2 December 2013 09:19, Paul Moore <p.f.moore at gmail.com> wrote:
> On 2 December 2013 07:31, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> The only problem I want to take off the table is the one where
>> multiple wheel files try to share a dynamically linked external binary
>> dependency.
>
> OK. Thanks for the clarification.
>
> Can I suggest that we need to be very careful how any recommendation
> in this area is stated? I certainly didn't get that impression from
> your initial posting, and from the other responses it doesn't look
> like I was the only one.

I understood what Nick meant but I still don't understand how he's
come to this conclusion.

> We're only just starting to get real credibility for wheel as a
> distribution format, and we need to get a very strong message out that
> wheel is the future, and people should be distributing wheels as their
> primary binary format. My personal litmus test is the scientific
> community - when Christoph Gohlke is distributing his (Windows) binary
> builds as wheels, and projects like numpy, ipython, scipy etc are
> distributing wheels on PyPI, rather than bdist_wininst, I'll feel like
> we have got to the point where wheels are "the norm". The problem is,
> of course, that with conda being a scientific distribution at heart,
> any message we issue that promotes conda in any context will risk
> confusion in that community.

Nick's proposal is basically incompatible with allowing Cristoph
Gohlke to use pip and wheels. Christoph provides a bewildering array
of installers for prebuilt packages that are interchangeable with
other builds at the level of Python code but not necessarily at the
binary level. So, for example, His scipy is incompatible with the
"official" (from SourceForge) Windows numpy build because it links
with the non-free Intel MKL library and it needs numpy to link against
the same. Installing his scipy over the other numpy results in this:
https://mail.python.org/pipermail//python-list/2013-September/655669.html

So Christoph can provide wheels and people can manually download them
and install from them but would beginners find that any easier than
running the .exe installers? The .exe installers are more powerful and
can do things like the numpy super-pack that distributes binaries for
different levels of SSE support (as discussed previously on this list
the wheel format cannot currently achieve this). Beginners will also
find .exe installers more intuitive than running pip on the command
line and will typically get better error messages etc. than pip
provides. So I don't really see why Cristoph should bother switching
formats (as noted by Paul before anyone who wants a wheel cache can
easily convert his installers into wheels).

AFAICT what Nick is saying is that it's not possible for pip and PyPI
to guarantee the compatibility of different binaries because unlike
apt-get and friends only part of the software stack is controlled.
However I think this is not the most relevant difference between pip
and apt-get here. The crucial difference is that apt-get communicates
with repositories where all code and all binaries are under control of
a single organisation. Pip (when used normally) communicates with PyPI
and no single organisation controls the content of PyPI. So there's no
way for pip/PyPI to guarantee *anything* about the compatibility of
the code that they distribute/install, whether the problems are to do
with binary compatibility or just compatibility of pure Python code.
For pure Python distributions package authors are expected to solve
the compatibility problems and pip provides version specifiers etc
that they can use to do this. For built distributions they could do
the same - except that pip/PyPI don't provide a mechanism for them to
do so.

Because PyPI is not a centrally controlled single software stack it
needs a different model for ensuring compatibility - one driven by the
community. People in the Python community are prepared to spend a
considerable amount of time, effort and other resources solving this
problem. Consider how much time Cristoph Gohlke must spend maintaining
such a large internally consistent set of built packages. He has
created a single compatible binary software stack for scientific
computation. It's just that PyPI doesn't give him any way to
distribute it. If perhaps he could own a tag like "cgohlke" and upload
numpy:cgohlke and scipy:cgohlke then his scipy:cgohlke wheel could
depend on numpy:cgohlke and numpy:cgohlke could somehow communicate
the fact that it is incompatible with any other scipy distribution.
This is one way in which pip/PyPI could facilitate the Python
community to solve the binary compatibility problems.

[As an aside I don't know whether Cristoph's Intel license would
permit distribution via PYPI.]

Another way would be to allow the community to create compatibility
tags so that projects like numpy would have mechanisms to indicate
e.g. Fortran ABI compatibility. In this model no one owns a particular
tag but projects that depend on one another could simply use them in a
consistent way that pip could understand.

The impression I got from Nick's initial post is that, having
discovered that the compatibility tags used in the wheel format are
insufficient for the needs of the Python community and that it's not
possible to enumerate the tags needed, pip/PyPI should just give up on
the problem of binary compatibility. I think it would be better to
think about simple mechanisms that the authors of the concerned
packages could use so that people in the Python community can solve
these problems for each of the packages they contribute to. There is
enough will out there to make this work for all the big packages and
problematic operating systems if only PyPI will allow it.


Oscar


More information about the Distutils-SIG mailing list