[Distutils] Working toward Linux wheel support

Nathaniel Smith njs at pobox.com
Thu Sep 10 01:49:31 CEST 2015

On Wed, Sep 9, 2015 at 8:06 AM, Nate Coraor <nate at bx.psu.edu> wrote:
> On Tue, Sep 8, 2015 at 10:10 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Mon, Sep 7, 2015 at 9:02 AM, Donald Stufft <donald at stufft.io> wrote:
>> > On September 3, 2015 at 1:23:03 PM, Nate Coraor (nate at bx.psu.edu) wrote:
>> >> >>>
>> >> >>> I'll create PRs for this against wheel and pip shortly. I can also
>> >> >>> work
>> >> >>> on a PEP for the platform tag - I don't think it's going to need to
>> >> >>> be a
>> >> >>> big one. Are there any preferences as to whether this should be a
>> >> >>> new PEP
>> >> >>> or an update to 425?
>> >> >>>
>> >
>> > Coming back to this, I'm wondering if we should include the libc
>> > implementation/version in a less generic, but still generic linux wheel.
>> > Right
>> > now if you staticly link I think the only platform ABIs you need to
>> > worry about
>> > are libc and Python itself. Python itself is handled already but libc is
>> > not.
>> > The only thing I've seen so far is "build on an old enough version of
>> > glibc
>> > that it handles anything sane", but not all versions of Linux even use
>> > glibc at
>> > all.
>> This feels kinda half-baked to me?
>> "linux" is a useful tag because it has a clear meaning: "there exists
>> a linux system somewhere that can run this, but no guarantees about
>> which one, good luck". When building a wheel it's easy to tell whether
>> this tag can be correctly applied.
> I'm not sure how it'd be possible to tell. The same meaning for a generic
> tag would be true of any wheel built, regardless of whether the wheel has
> dependencies in addition to libc.

Sure... my point is just that "linux" is unambiguous and fills a
niche: it unambiguously says "you're on your own", and sometimes
that's the best we can hope to say.

>> Distro-specific tags are useful because they also have a fairly clear
>> meaning: "here's a specific class of systems that can run this, so
>> long as you install enough packages to fulfill the external
>> dependencies". Again, when building a wheel it's pretty easy to tell
>> whether this tag can be correctly applied. (Of course someone could
>> screw this up, e.g. by building on a system is technically distro X
>> but has some incompatible hand-compiled libraries installed, but 99%
>> of the time we can guess correctly.)
>> If we define a LSB-style base system and give it a tag, like I don't
>> know, the "Python base environment", call it "linux_pybe1_core" or
>> something, that describes what libraries are and aren't available and
>> their ABIs, and provide docs/tooling to help people explicitly create
>> such wheels and check whether they're compatible with their system,
>> then this is also useful -- we have proof that this is sufficient to
>> actually distribute arbitrary software usefully, given that multiple
>> distributors have converged on this strategy. (I've been talking to
>> some people off-list about maybe actually putting together a proposal
>> like this...)
>> To me "linux_glibc_2.18" falls between the cracks though. If this
>> starts being what you get by default when you build a wheel, then
>> people will use it for wheels that are *not* statically linked, and
>> what that tag will mean is "there exists some system that can run
>> this, and that has glibc 2.18 on it, and also some other unspecified
>> stuff, good luck". Which is pretty useless -- we might as well just
>> stick with "linux" in this case. OTOH if it's something that builders
>> have to opt into, then we could document that it's only to be used for
>> wheels that are statically linked except for glibc, and make it mean
>> "*any* system which has glibc 2.18 or later on it can run this". Which
>> would be useful in some cases.
> This is a tooling issue. If wheel (the package) inspects the built .so files
> and finds they are not dynamically linked to anything not included with
> glibc, it can apply the glibc tag. Otherwise, it'd apply the distro tag.
> There's no possibility for human error here, unless they explicitly override
> the platform tag.
>> But at this point it's basically a
>> version of the "defined base environment" approach, and once you've
>> gone that far you might as well take advantage of the various
>> distributors' experience about what should actually be in that
>> environment -- glibc isn't enough.
> While I agree that glibc isn't always enough, defining a base environment
> that may not be met by the "standard" install of popular distributions makes
> unprivileged wheel installation much more difficult.

Yeah, which is why my suggestion is that we steal the "base
environment" definition from the folks like Continuum and Enthought
who have already done the work of determining what is in the
"standard" install of popular distributions, and have spent years
actually distributing packages to unprivileged users :-).

> It's also not going to
> work out of the box on older distributions that wouldn't provide whatever
> standardized mechanism is defined for a list of "base environments currently
> provided by this system" (unless pip does the work itself at runtime to
> determine whether a base environment is met).

Right -- which is basically what pip will have to do to figure out the
current glibc version too, right?

Trying to guess whether the installed versions of several libraries
are really ABI compatible with what we expect is harder than trying to
guess whether the installed version of glibc alone is really ABI
compatible with what we expect, but in both cases it's basically a
heuristic (some distros could have local patches to their glibc that
break ABI, who knows) and in both cases it's basically safe to just
assume it will work (because if we stick to libraries that other
distributors are already depending on then we have years of experience
that it pretty much always works).

> Maybe an important question:
> how many popular packages with C Extensions have dependencies in addition to
> glibc?

Certainly enough that the major distributors of binary packages on
Linux, like Continuum and Enthought, have decided that they need to
require more than glibc :-).

libstdc++ is an example of one particularly common external dependency.

To be clear: if you're talking specifically about the model where we
validate that the extensions are statically linked before we enable
the glibc tag, then I don't think it will do any harm to have it as an
option. It just seems redundant with the more general solution.


Nathaniel J. Smith -- http://vorpus.org

More information about the Distutils-SIG mailing list