On Fri, Aug 14, 2015 at 3:38 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Aug 13, 2015 at 7:27 PM, Robert Collins
<robertc@robertcollins.net> wrote:
> On 14 August 2015 at 14:14, Nathaniel Smith <njs@pobox.com> wrote:
> ...>
>> Of course if you have an alternative proposal than I'm all ears :-).
>
> Yeah :)
>
> So, I want to dedicate some time to contributing to this discussion
> meaningfully, but I can't for the next few weeks - Jury duty, Kiwi
> PyCon and polishing up the PEP's I'm already committed to...

Totally hear that... it's not super urgent anyway. We should make it
clear to Nate -- hi Nate! -- that there's no reason that solving this
problem should block putting together the basic
binary-compatibility.cfg infrastructure.

Hi!

I've been working on bits of this as I've also been working on, as a test case, building out psycopg2 wheels for lots of different popular distros on i386 and x86_64, UCS2 and UCS4, under Docker. As a result, it's clear that my Linux distro tagging work in wheel's pep425tags has some issues. I've been adding to this list of distributions but it's going to need a lot more work:

https://bitbucket.org/pypa/wheel/pull-requests/54/soabi-2x-platform-os-distro-support-for/diff#Lwheel/pep425tags.pyT61

So I need a bit of guidance here. I've arbitrarily chosen some tags - `rhel` for example - and wonder if, like PEP 425's mapping of Python implementations to tags, a defined mapping of Linux distributions to shorthand tags is necessary (of course this would be difficult to keep up to date, but binary-compatibility.cfg would make it less relevant in the long run).

Alternatively, I could simply trust and normalize platform.linux_distribution()[0], but this means that the platform tag on RHEL would be something like `linux_x86_64_red_hat_enterprise_linux_server_6_5`

Finally, by *default*, the built platform tag will include whatever version information is provided in platform.linux_distribution()[1], but the "major-only" version is also included in the list of platforms, so a default debian tag might look like `linux_x86_64_debian_7_8`, but it would be possible to build (and install) `linux_x86_64_debian_7`. However, it may be the case that the default (at least for building, maybe not for installing) ought to be the major-only tag since it should really be ABI compatible with any minor release of that distro.

--nate

 
> I think the approach of being able to ask the *platform* for things
> needed to build-or-use known artifacts is going to enable a bunch of
> different answers in this space. I'm much more enthusiastic about that
> than doing anything that ends up putting PyPI in competition with the
> distribution space.
>
> My criteria for success are:
>
> - there's *a* migration path from what we have today to what we
> propose. Doesn't have to be good, just exist.
>
>  - authors of scipy, numpy, cryptography etc can upload binary wheels
> for *linux, Mac OSX and Windows 32/64 in a safe and sane way

So the problem is that, IMO, "sane" here means "not building a
separate wheel for every version of distro on distrowatch". So I can
see two ways to do that:
- my suggestion that we just pick a particular highly-compatible
distro like centos 5 to build against, and make a standard list of
which libraries can be assumed to be provided
- the PEP-497-or-number-to-be-determined approach, in which we still
have to pick a highly-compatible distro like centos 5 to build
against, but each wheel has a list of which libraries from that distro
it is counting on being provided

I can see the appeal of the latter approach, since if you want to do
the former approach right you need to be careful about exactly which
libraries you're assuming are present, etc. They both could work. But
in practice, you still have to pick which distro you are going to use
to build, and you still have to say "when I say I need libblas.so.1,
what I mean is that I need a file that is ABI-compatible with the
version of libblas.so.1 that existed in centos 5 exactly, not any
other libblas.so.1". And then in practice not every distro will have
such a thing, so for a project like numpy that wants to make things
easy for a wide variety of users, we'll still only be able to take
advantage of external dependencies for libraries that are effectively
universally available and compatible anyway and end up vendoring the
rest... so in the end basically we'd be distributing exactly the same
wheels under either of these proposals, just the latter requires a
much much more complicated scheme for metadata and installation.

And in practice I think the main alternative possibility if we don't
come up with some solid guidance for how packages can build
works-everywhere-wheels is that we'll see wheels for
latest-version-of-Ubuntu-only, plus the occasional smattering of other
distros, varying randomly on a project-by-project basis. Which would
suck.

>  - we don't need to do things like uploading wheels containing
> non-Python shared libraries, nor upload statically linked modules
>
>
> In fact, I think uploading regular .so files is just a huge heartache
> waiting to happen, so I'm almost inclined to add:
>
>  -  we don't support uploading external non-Python libraries [ without
> prejuidice for changing our minds in the future]

Windows and OS X don't (reliably) have any package manager. So PyPI
*is* inevitably going to contain non-Python shared libraries or
statically linked modules or something like that. (And in fact it
already contains such things today.) I'm not sure what the alternative
would even be.

This also means that projects like numpy are already forced to accept
that we're on the hook for security updates in our dependencies etc.,
so doing it on Linux too is not really that scary.

Oh, I just thought of another issue: an extremely important
requirement for numpy/scipy/etc. wheels is that they be reliably
installable without root access. People *really* care about this:
missing your grant deadline b/c you can't upgrade some package to fix
some showstopper bug b/c university IT support is not answering calls
at midnight on Sunday = rather poor UX.

Given that, the only situation I can see where we would ever
distribute wheels that require system blas on Linux, is if we were
able to do it alongside wheels that do not require system blas, and
pip were clever enough to reliably always pick the latter except in
cases where the system blas was actually present and working.

> There was a post that referenced a numpy ABI, dunno if it was in this
> thread - I need to drill down into that, because I don't understand
> why thats not a regular version resolution problem,unlike the Python
> ABI, which pip can't install [and shouldn't be able to!]

The problem is that numpy is very unusual among Python packages in
that exposes a large and widely-used *C* API/ABI:

    http://docs.scipy.org/doc/numpy/reference/c-api.html

This means that when you build, e.g., scipy, then you get a binary
that depends on things like the in-memory layout of numpy's internal
objects. We'd like it to be the case that when we release a new
version of numpy, pip could realize "hey, this new version says it has
an incompatible ABI that will break your currently installed version
of scipy -- I'd better fetch a new version of scipy as well, or at
least rebuild the same version I already have". Notice that at the
time scipy is built, it is not known which future version of numpy
will require a rebuild. There are a lot of ways this might work on
both the numpy and pip sides -- definitely fodder for a separate
thread -- but that's the basic problem.

-n

--
Nathaniel J. Smith -- http://vorpus.org