[Distutils] Handling the binary dependency management problem

Oscar Benjamin oscar.j.benjamin at gmail.com
Tue Dec 3 13:49:27 CET 2013


On 3 December 2013 11:54, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 3 December 2013 21:22, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
>> AFAICT conda/binstar are alternatives for pip/PyPI that happen to host
>> binaries for some packages that don't have binaries on PyPI. (conda
>> also provides a different - incompatible - take on virtualenvs but
>> that's not relevant to this proposal).
>
> It sounds like I may have been confusing two presentations at the
> packaging mini-summit, as I would have sworn conda used hashes to
> guarantee a consistent set of packages. I know I have mixed up
> features between hashdist and conda in the past (and there have been
> some NixOS features mixed in there as well), so it wouldn't be the
> first time that has happened - the downside of mining different
> distribution systems for ideas is that sometimes I forget where I
> encountered particular features :)

I had the same confusion with hashdist at the start of this thread
when I said that conda was targeted at HPC. So if we both make the
same mistake I guess it's forgiveable :)

> If conda doesn't offer such an internal consistency guarantee for
> published package sets, then I agree with the criticism that it's just
> an alternative to running a private PyPI index server hosting wheel
> files pre-built with particular options, and thus it becomes
> substantially less interesting to me :(

Perhaps Travis who is still CC'ed here could comment on this since it
is apparent that no one here really understands what conda is and he
apparently works for Continuum Analytics so should (hopefully) know a
little more...

> Under that model, what conda is doing is *already covered* in the
> draft metadata 2.0 spec (as of the changes I posted about the other
> day), since that now includes an "integrator suffix" (to indicate when
> a downstream rebuilder has patched the software), as well as a
> "python.integrator" metadata extension to give details of the rebuild.
> The namespacing in the wheel case is handled by not allowing rebuilds
> to be published on PyPI - they have to be published on a separate
> index server, and thus can be controlled based on where you tell pip
> to look.

Do you mean to say that PyPI can (should) only host a
binary-compatible set of wheels and that other index servers should do
the same?

I still think that there needs to be some kind of compatibility tags either way.

> So, I apologise for starting the thread based on what appears to be a
> fundamentally false premise, although I think it has still been useful
> despite that error on my part (as the user confusion is real, even
> though my specific proposal no longer seems as useful as I first
> thought).
>
> I believe helping the conda devs to get it to play nice with virtual
> environments is still a worthwhile exercise though (even if just by
> pointing out areas where it *doesn't* currently interoperate well, as
> we've been doing in the last day or so), and if the conda
> bootstrapping issue is fixed by publishing wheels (or vendoring
> dependencies), then "try conda if there's no wheel" may still be a
> reasonable fallback recommendation.

Well for a start conda (at least according to my failed build)
over-writes the virtualenv activate scripts with its own scripts that
do something completely different and can't even be called with the
same signature. So it looks to me as if there is no intention of
virtualenv compatibility.

As for "try conda if there's no wheel" according to what I've read
that seems to be what people who currently use conda do.

I thought about another thing during the course of this thread. To
what extent can Provides/Requires help out with the binary
incompatibility problems? For example numpy really does provide
multiple interfaces:
1) An importable Python module that can be used from Python code.
2) A C-API that can be used by compiled C-extensions.
3) BLAS/LAPACK libraries with a particular Fortran ABI to any other
libraries in the same process.

Perhaps the solution is that a build of a numpy wheel should clarify
explicitly what it Provides at each level e.g.:

Provides: numpy
Provides: numpy-capi-v1
Provides: numpy-openblas-g77

Then a built wheel for scipy can Require the same things. Cristoph
Gohlke could provide a numpy wheel with:

Provides: numpy
Provides: numpy-capi-v1
Provides: numpy-intelmkl

And his scipy wheel can require the same. This would mean that pip
would understand the binary dependency problems during dependency
resolution and could reject an incompatible wheel at install time as
well as being able to find a compatible wheel automatically if one
exists in the server. Unlike the hash-based dependencies we can see
that it is possible to depend on the numpy C-API without necessarily
depending on any particular BLAS/LAPACK library and Fortran compiler
combination.

The confusing part would be that then a built wheel doesn't Provide
the same thing as the corresponding sdist. How would anyone know what
would be Provided by an sdist without first building it into a wheel?
Would there need to be a way for pip to tell the sdist what pip wants
it to Provide when building it?


Oscar


More information about the Distutils-SIG mailing list