[Distutils] Handling the binary dependency management problem

Nick Coghlan ncoghlan at gmail.com
Tue Dec 3 09:48:41 CET 2013


Thanks for the robust feedback folks - it's really helping me to clarify
what I think, and why I consider this an important topic :)

On 3 Dec 2013 10:36, "Chris Barker" <chris.barker at noaa.gov> wrote:
>
> On Mon, Dec 2, 2013 at 5:22 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> And the conda folks are working on playing nice with virtualenv - I
don't we'll see a similar offer from Microsoft for MSI any time soon :)
>
> nice to know...
>>
>> > > a single organisation. Pip (when used normally) communicates with
PyPI
>> > > and no single organisation controls the content of PyPI.
>
> can't you point pip to a "wheelhouse'? How is that different?

Right, you can do integrated environments with wheels, that's one of the
use cases they excel at.

>
>> >>For built distributions they could do
>> > > the same - except that pip/PyPI don't provide a mechanism for them to
>> > > do so.
>
> I'm still confused as to what conda provides here -- as near as I can
tell, conda has a nice hash-based way to ensure binary compatibility --
which is a good thing. But the "curated set of packages" is an independent
issue. What's stopping anyone from creating a nice curated set of packages
with binary wheels (like the Gohlke repo....)

Hmm, has anyone tried running devpi on a PaaS? :)

> And wouldn't it be better to make wheel a bit more robust in this regard
than add yet another recommended tool to the mix?

Software that works today is generally more useful to end users than
software that might possibly handle their use case at some currently
unspecified point in the future :)

>> Exactly, this is the difference between pip and conda - conda is a
solution for installing from curated *collections* of packages. It's
somewhat related to the tagging system people are speculating about for
PyPI, but instead of being purely hypothetical, it already exists.
>
> Does it? I only know of one repository of conda packages -- and it
provides poor support for some things (like wxPython -- does it support any
desktop GUI on OS-X?)
>
> So why do we think that conda is a better option for these unknown
curatied repos?

Because it already works for the scientific stack, and if we don't provide
any explicit messaging around where conda fits into the distribution
picture, users are going to remain confused about it for a long time.

> Also, I'm not sure I WANT anymore curated repos -- I'd rather a standard
set by python.org that individual package maintainers can choose to support.
>
>> PyPI wheels would then be about publishing "default" versions of
components, with the broadest compatibility, while conda would be a
solution for getting access to alternate builds that may be faster, but
require external shared dependencies.
>
> I'm still confused as to why packages need to share external dependencies
(though I can see why it's nice...) .

Because they reference shared external data, communicate through shared
memory, or otherwise need compatible memory layouts. It's exactly the same
reason all C extensions need to be using the same C runtime as CPython on
Windows: because things like file descriptors break if they don't.

> But what's the new policy here? Anaconda and Canopy exist already? Do we
need to endorse them? Why? If you want "PyPI wheels would then be about
publishing "default" versions of components, with the broadest
compatibility," -- then we still need to improve things a bit, but we can't
say "we're done"

Conda solves a specific problem for the scientific community, but in their
enthusiasm, the developers are pitching it as a general purpose packaging
solution. It isn't, but in the absence of a clear explanation of its
limitations from us, both its developers and other Python users are likely
to remain confused about the matter.

>
>> What Christoph is doing is producing a cross-platform curated binary
software stack, including external dependencies. That's precisely the
problem I'm suggesting we *not* try to solve in the core tools any time
soon, but instead support bootstrapping conda to solve the problem at a
different layer.
>
> So we are advocating that others, like Christoph, create curated stack
with conda? Asside from whether conda really provides much more than wheel
to support doing this, I think it's a BAD idea to encourage it: I'd much
rather encourage package maintainers to build "standard" packages, so we
can get some extra interoperabilty.
>
> Example: you can't use wxPython with Anocoda (on the Mac, anyway). At
least not without figuring out how to build it yourself, an I'm not sure it
will even work then. (and it is a fricking nightmare to build). But it's
getting harder to find "standard" packages for the mac for the SciPy stack,
so people are really stuck.
>
>> So the pip compatible builds for those tools would likely miss out on
some of the external acceleration features,
>
> that's fine -- but we still need those pip compatible builds ....
>
> and the nice thing about pip-compatible builds (really python.orgcompatible builds...) is that they play well with the other binary
installers --
>>
>> By ceding the "distribution of cross-platform curated software stacks
with external binary dependencies" problem to conda, users would get a
solution to that problem that they can use *now*,
>
> Well, to be fair, I've been starting a project to provide binaries for
various packages for OS_X amd did intend to give conda a good look-see, but
I really has hoped that wheels where "the way" now...oh well.

Wheels *are* the way if one or both of the following conditions hold:

- you don't need to deal with build variants
- you're building for a specific target environment

That covers an awful lot of ground, but there's one thing it definitely
doesn't cover: distributing multiple versions of NumPy built with different
options and cohesive ecosystems on top of that.

Now, there are various ideas for potentially making wheels handle that,
from the scientific community all agreeing on a common set of build
settings and publishing consistent wheels, to a variant tagging system, to
having pip remember the original source of a software distribution, but
they're all vapourware at this point, with no concrete plans to change
that, and plenty of higher priority problems to deal with.

By contrast, conda already exists, and already works, as it was designed
*specifically* to handle the scientific Python stack.

Unfortunately, the folks making conda don't quite grasp the full breadth of
the use cases that pip handles, so their docs do a lousy job of explaining
conda's limitations. It meets their needs perfectly though, along with the
needs of many other people, so they're correspondingly enthusiastic in
wanting to share its benefits with others.

This means that one key reason I want to recommend it for the cases where
it is a good fit (i.e. the scientific Python stack) is so we can explicitly
advise *against* using it in other cases where it will just add complexity
without adding value.

Saying nothing is not an option, since people are already confused. Saying
to never use it isn't an option either, since bootstrapping conda first
*is* a substantially simpler cross-platform way to get up to date
scientific Python software on to your system. The alternatives are platform
specific and (at least in the Linux distro case) slower to get updates.

Cheers,
Nick.

> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20131203/8296af26/attachment-0001.html>


More information about the Distutils-SIG mailing list