On Fri, Aug 14, 2015 at 3:38 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Aug 13, 2015 at 7:27 PM, Robert Collins <robertc@robertcollins.net> wrote:
On 14 August 2015 at 14:14, Nathaniel Smith <njs@pobox.com> wrote: ...>
Of course if you have an alternative proposal than I'm all ears :-).
Yeah :)
So, I want to dedicate some time to contributing to this discussion meaningfully, but I can't for the next few weeks - Jury duty, Kiwi PyCon and polishing up the PEP's I'm already committed to...
Totally hear that... it's not super urgent anyway. We should make it clear to Nate -- hi Nate! -- that there's no reason that solving this problem should block putting together the basic binary-compatibility.cfg infrastructure.
Hi! I've been working on bits of this as I've also been working on, as a test case, building out psycopg2 wheels for lots of different popular distros on i386 and x86_64, UCS2 and UCS4, under Docker. As a result, it's clear that my Linux distro tagging work in wheel's pep425tags has some issues. I've been adding to this list of distributions but it's going to need a lot more work: https://bitbucket.org/pypa/wheel/pull-requests/54/soabi-2x-platform-os-distr... So I need a bit of guidance here. I've arbitrarily chosen some tags - `rhel` for example - and wonder if, like PEP 425's mapping of Python implementations to tags, a defined mapping of Linux distributions to shorthand tags is necessary (of course this would be difficult to keep up to date, but binary-compatibility.cfg would make it less relevant in the long run). Alternatively, I could simply trust and normalize platform.linux_distribution()[0], but this means that the platform tag on RHEL would be something like `linux_x86_64_red_hat_enterprise_linux_server_6_5` Finally, by *default*, the built platform tag will include whatever version information is provided in platform.linux_distribution()[1], but the "major-only" version is also included in the list of platforms, so a default debian tag might look like `linux_x86_64_debian_7_8`, but it would be possible to build (and install) `linux_x86_64_debian_7`. However, it may be the case that the default (at least for building, maybe not for installing) ought to be the major-only tag since it should really be ABI compatible with any minor release of that distro. --nate
I think the approach of being able to ask the *platform* for things needed to build-or-use known artifacts is going to enable a bunch of different answers in this space. I'm much more enthusiastic about that than doing anything that ends up putting PyPI in competition with the distribution space.
My criteria for success are:
- there's *a* migration path from what we have today to what we propose. Doesn't have to be good, just exist.
- authors of scipy, numpy, cryptography etc can upload binary wheels for *linux, Mac OSX and Windows 32/64 in a safe and sane way
So the problem is that, IMO, "sane" here means "not building a separate wheel for every version of distro on distrowatch". So I can see two ways to do that: - my suggestion that we just pick a particular highly-compatible distro like centos 5 to build against, and make a standard list of which libraries can be assumed to be provided - the PEP-497-or-number-to-be-determined approach, in which we still have to pick a highly-compatible distro like centos 5 to build against, but each wheel has a list of which libraries from that distro it is counting on being provided
I can see the appeal of the latter approach, since if you want to do the former approach right you need to be careful about exactly which libraries you're assuming are present, etc. They both could work. But in practice, you still have to pick which distro you are going to use to build, and you still have to say "when I say I need libblas.so.1, what I mean is that I need a file that is ABI-compatible with the version of libblas.so.1 that existed in centos 5 exactly, not any other libblas.so.1". And then in practice not every distro will have such a thing, so for a project like numpy that wants to make things easy for a wide variety of users, we'll still only be able to take advantage of external dependencies for libraries that are effectively universally available and compatible anyway and end up vendoring the rest... so in the end basically we'd be distributing exactly the same wheels under either of these proposals, just the latter requires a much much more complicated scheme for metadata and installation.
And in practice I think the main alternative possibility if we don't come up with some solid guidance for how packages can build works-everywhere-wheels is that we'll see wheels for latest-version-of-Ubuntu-only, plus the occasional smattering of other distros, varying randomly on a project-by-project basis. Which would suck.
- we don't need to do things like uploading wheels containing non-Python shared libraries, nor upload statically linked modules
In fact, I think uploading regular .so files is just a huge heartache waiting to happen, so I'm almost inclined to add:
- we don't support uploading external non-Python libraries [ without prejuidice for changing our minds in the future]
Windows and OS X don't (reliably) have any package manager. So PyPI *is* inevitably going to contain non-Python shared libraries or statically linked modules or something like that. (And in fact it already contains such things today.) I'm not sure what the alternative would even be.
This also means that projects like numpy are already forced to accept that we're on the hook for security updates in our dependencies etc., so doing it on Linux too is not really that scary.
Oh, I just thought of another issue: an extremely important requirement for numpy/scipy/etc. wheels is that they be reliably installable without root access. People *really* care about this: missing your grant deadline b/c you can't upgrade some package to fix some showstopper bug b/c university IT support is not answering calls at midnight on Sunday = rather poor UX.
Given that, the only situation I can see where we would ever distribute wheels that require system blas on Linux, is if we were able to do it alongside wheels that do not require system blas, and pip were clever enough to reliably always pick the latter except in cases where the system blas was actually present and working.
There was a post that referenced a numpy ABI, dunno if it was in this thread - I need to drill down into that, because I don't understand why thats not a regular version resolution problem,unlike the Python ABI, which pip can't install [and shouldn't be able to!]
The problem is that numpy is very unusual among Python packages in that exposes a large and widely-used *C* API/ABI:
http://docs.scipy.org/doc/numpy/reference/c-api.html
This means that when you build, e.g., scipy, then you get a binary that depends on things like the in-memory layout of numpy's internal objects. We'd like it to be the case that when we release a new version of numpy, pip could realize "hey, this new version says it has an incompatible ABI that will break your currently installed version of scipy -- I'd better fetch a new version of scipy as well, or at least rebuild the same version I already have". Notice that at the time scipy is built, it is not known which future version of numpy will require a rebuild. There are a lot of ways this might work on both the numpy and pip sides -- definitely fodder for a separate thread -- but that's the basic problem.
-n
-- Nathaniel J. Smith -- http://vorpus.org