[Distutils] Handling the binary dependency management problem

Nick Coghlan ncoghlan at gmail.com
Sun Dec 1 23:17:48 CET 2013

On 2 Dec 2013 06:48, "Paul Moore" <p.f.moore at gmail.com> wrote:
> On 1 December 2013 19:21, Marcus Smith <qwcode at gmail.com> wrote:
> >>> sometimes mean needing to build components with external dependencies
> >>> from source
> >
> > you mean build once (or maybe after system updates for wheels with
> > binary deps), and cache as a local wheel, right?

Right. The advantage of conda is that the use of hash based dependencies
lets them distribute arbitrary binary dependencies, whether they're Python
libraries or not.

> Note that it is possible to convert binary packages from other formats
> to wheel. We already support egg and bdist_wininst. If conda has built
> packages that are worth supporting, I doubt that a conda-to-wheel
> converter would be hard to write. I'd be willing to write one if
> someone can point me at a spec for the conda package format (I
> couldn't find one from a brief look through the docs).
> Conversely, if I have a wininst installer, a wheel or an egg, is there
> a converter to conda format? I can't see having to use conda for some
> things and pip for others as being a move towards lessening user
> confusion.

I see conda as existing at a similar level to apt and yum from a packaging
point of view, with zc.buildout as a DIY equivalent at that level.

For example, I installed Nikola into a virtualenv last night. That required
installing the development headers for libxml2 and libxslt, but the error
that tells you that is a C compiler one.

I've been a C programmer longer than I have been a Python one, but I still
had to resort to Google to try to figure out what dev libraries I needed.

Outside the scientific space, crypto libraries are also notoriously hard to
build, as are game engines and GUI toolkits. (I guess database bindings
could also be a problem in some cases)

We have the option to leave handling the arbitrary binary dependency
problem to platforms, and I think we should take it.

This is why I suspect there will be a better near term effort/reward
trade-off in helping the conda folks improve the usability of their
platform than there is in trying to expand the wheel format to cover
arbitrary binary dependencies.

Longer term, I expect we can expand pip's parallel install capabilities to
support a better multi-version experience (after Python 3.4 is out the
door, I intend to devote some quality time to trying to improve on the
existing pkg_resources multi-version API now that I've been using it long
enough to understand it's strengths and limitations) and potentially come
up with a scheme for publishing supporting binaries inside prebuilt wheel
files, but there are plenty of problems ahead of those on the todo list. By
contrast, conda exists now and is being used happily by many members of the
scientific and data analysis communities. It's not vapourware - it already
works for its original target audience, and they love it.

> I see conda (and enthought) as more like distributions - they sit in
> the same space as rpms and debs. I don't see them as alternatives to
> pip. Am I wrong? After all, both conda and enthought supply Python
> interpreter builds as well as package repositories. And both feel like
> unified ecosystems (you manage everything Python-related with conda)
> rather than tools like pip (that works with existing Python standards
> and interoperates with easy_install, distutils, etc).

Exactly correct - in my view, the binary dependency management problem is
exactly what defines the distinction between a cross-platform toolkit like
pip and virtualenv and a new platform like conda.

It's precisely the fact that conda defines a new platform that mostly
ignores the underlying system that:

- would make it a terrible choice for the core cross-platform software
distribution toolkit
- makes it much easier for them to consistently support a compiler free
experience for end users

Hence the layering proposal: if you're willing to build binary extensions
from source occasionally, pip and virtualenv are already all you need. If
you want someone else to handle the builds for you, rely on a platform

Like all platforms, conda has gaps in what it provides, but streamlining
the PyPI to conda pipeline is going to be easier than streamlining
inclusion in Linux distros, and Windows and Mac OS X have no equivalent
binary dependency management system in the first place.

> If the issue is simply around defining compatibility tags that better
> describe the various environments around, then let's just get on with
> that - we're going to have to do it in the end anyway, why temporarily
> promote an alternative solution just to change our recommendation
> later?

That will get us to the point of correctly supporting self-contained
extensions that only rely on the Python ABI and the platform C/C++ runtime,
and I agree is a problem we need to solve at the cross-platform toolkit

Conda is about solving the arbitrary binary dependency problem, in the same
way other platforms do: by creating a pre-integrated collection of built
libraries to cut down on the combinatorial explosion of possibilities.

It isn't a coincidence that the "cross-platform platform" approach grew out
of the scientific community - a solution like that is essential given the
combination of ancient software with arcane build systems and end users
that just want to run their data analysis rather than mess about with
making the software work.

My key point is that if you substitute "beginner" for "scientist" the
desired end user experience is similar, but if you substitute "professional
software developer" or "system integrator", it isn't - we often *want* (or
need) to build from source and do our own integration, so a pre-integrated
approach like conda is inappropriate to our use cases.

I hadn't previously brought this up because I'm not entirely convinced
conda is mature enough yet either, but as noted in the original post, it's
become clear to me lately that people are *already* confused about how
conda relates to pip, and that the lack of a shared understanding of how
they differ has been causing friction on both sides.

That friction is unnecessary - conda is no more a potential replacement for
pip and virtualenv than zc.buildout is, but like zc.buildout, imposing
additional constraints allows conda to solve problems that are difficult,
perhaps even impossible, to handle at the core toolkit level.

This also came up when Donald asked people to tell him about the problems
they saw in packaging:

Different tools: https://github.com/pypa/packaging-problems/issues/23
Build problems with standard tools:

By being clear that we understand *why* the scientific community found a
need to create their own tools, we can also start to make it clearer why
their solutions to their specific problems don't extend to the full
spectrum of use cases that need to be supported at the cross-platform
toolkit level.

> Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20131202/523c5bf4/attachment.html>

More information about the Distutils-SIG mailing list