[Distutils] draft PEP: manylinux1

Nathaniel Smith njs at pobox.com
Wed Jan 20 22:55:12 EST 2016


Hi all,

Here's a first draft of a PEP for the manylinux1 platform tag
mentioned earlier, posted for feedback. Really Robert McGibbon should
get the main credit for this, since he wrote it, and also the docker
image and the amazing auditwheel tool linked below, but he asked me to
do the honors of posting it :-).

BTW, if anyone wants to try this out, there are some test
"manylinux1-compatible" wheels at
  https://vorpus.org/~njs/tmp/manylinux-test-wheels/repaired
for PySide (i.e. Qt) and numpy (using openblas). They should be
installable on any ordinary linux system with:
  pip install --no-index -f
https://vorpus.org/~njs/tmp/manylinux-test-wheels/repaired $PKG
(Note that this may require a reasonably up-to-date pip -- e.g. the
one in Debian is too old, which confused me for a bit.)

(How they were created: docker run -it quay.io/manylinux/manylinux
bash; install conda because to get builds of Qt and OpenBLAS because I
was too lazy to do it myself; pip wheel PySide / pip wheel numpy;
auditwheel repair <the resulting wheels>, which copies in all the
dependencies to make the wheels self-contained. Just proof-of-concept
for now, but they seem to work.)

----

PEP: XXXX
Title: A Platform Tag for Portable Linux Built Distributions
Version: $Revision$
Last-Modified: $Date$
Author: Robert T. McGibbon <rmcgibbo at gmail.com>, Nathaniel J. Smith
<njs at pobox.com>
Status: Draft
Type: Process
Content-Type: text/x-rst
Created: 19-Jan-2016
Post-History: 19-Jan-2016


Abstract
========

This PEP proposes the creation of a new platform tag for Python package built
distributions, such as wheels, called ``manylinux1_{x86_64,i386}`` with
external dependencies limited restricted to a standardized subset of
the Linux kernel and core userspace ABI. It proposes that PyPI support
uploading and distributing Wheels with this platform tag, and that ``pip``
support downloading and installing these packages on compatible platforms.


Rationale
=========

Currently, distribution of binary Python extensions for Windows and OS X is
straightforward. Developers and packagers build wheels, which are assigned
platform tags such as ``win32`` or ``macosx_10_6_intel``, and upload these
wheels to PyPI. Users can download and install these wheels using tools such
as ``pip``.

For Linux, the situation is much more delicate. In general, compiled Python
extension modules built on one Linux distribution will not work on other Linux
distributions, or even on the same Linux distribution with different system
libraries installed.

Build tools using PEP 425 platform tags [1]_ do not track information about the
particular Linux distribution or installed system libraries, and instead assign
all wheels the too-vague ``linux_i386`` or ``linux_x86_64`` tags. Because of
this ambiguity, there is no expectation that ``linux``-tagged built
distributions compiled on one machine will work properly on another, and for
this reason, PyPI has not permitted the uploading of wheels for Linux.

It would be ideal if wheel packages could be compiled that would work on *any*
linux system. But, because of the incredible diversity of Linux systems -- from
PCs to Android to embedded systems with custom libcs -- this cannot
be guaranteed in general.

Instead, we define a standard subset of the kernel+core userspace ABI that,
in practice, is compatible enough that packages conforming to this standard
will work on *many* linux systems, including essentially all of the desktop
and server distributions in common use. We know this because there are
companies who have been distributing such widely-portable pre-compiled Python
extension modules for Linux -- e.g. Enthought with Canopy [2]_ and Continuum
Analytics with Anaconda [3]_.

Building on the compability lessons learned from these companies, we thus
define a baseline ``manylinux1`` platform tag for use by binary Python
wheels, and introduce the implementation of preliminary tools to aid in the
construction of these ``manylinux1`` wheels.


Key Causes of Inter-Linux Binary Incompatibility
================================================

To properly define a standard that will guarantee that wheel packages meeting
this specification will operate on *many* linux platforms, it is necessary to
understand the root causes which often prevent portability of pre-compiled
binaries on Linux. The two key causes are dependencies on shared libraries
which are not present on users' systems, and dependencies on particular
versions of certain core libraries like ``glibc``.


External Shared Libraries
-------------------------

Most desktop and server linux distributions come with a system package manager
(examples include ``APT`` on Debian-based systems, ``yum`` on
``RPM``-based systems, and ``pacman`` on Arch linux) that manages, among other
responsibilities, the installation of shared libraries installed to system
directories such as ``/usr/lib``. Most non-trivial Python extensions will depend
on one or more of these shared libraries, and thus function properly only on
systems where the user has the proper libraries (and the proper
versions thereof), either installed using their package manager, or installed
manually by setting certain environment variables such as ``LD_LIBRARY_PATH``
to notify the runtime linker of the location of the depended-upon shared
libraries.


Versioning of Core Shared Libraries
-----------------------------------

Even if author or maintainers of a Python extension module with to use no
external shared libraries, the modules will generally have a dynamic runtime
dependency on the GNU C library, ``glibc``. While it is possible, statically
linking ``glibc`` is usually a bad idea because of bloat, and because certain
important C functions like ``dlopen()`` cannot be called from code that
statically links ``glibc``. A runtime shared library dependency on a
system-provided ``glibc`` is unavoidable in practice.

The maintainers of the GNU C library follow a strict symbol versioning scheme
for backward compatibility. This ensures that binaries compiled against an older
version of ``glibc`` can run on systems that have a newer ``glibc``. The
opposite is generally not true -- binaries compiled on newer Linux
distributions tend to rely upon versioned functions in glibc that are not
available on older systems.

This generally prevents built distributions compiled on the latest Linux
distributions from being portable.


The ``manylinux1`` policy
=========================

For these reasons, to achieve broad portability, Python wheels

 * should depend only on an extremely limited set of external shared
   libraries; and
 * should depend only on ``old`` symbol versions in those external shared
   libraries.

The ``manylinux1`` policy thus encompasses a standard for what the
permitted external shared libraries a wheel may depend on, and the maximum
depended-upon symbol versions therein.

The permitted external shared libraries are: ::

    libpanelw.so.5
    libncursesw.so.5
    libgcc_s.so.1
    libstdc++.so.6
    libm.so.6
    libdl.so.2
    librt.so.1
    libcrypt.so.1
    libc.so.6
    libnsl.so.1
    libutil.so.1
    libpthread.so.0
    libX11.so.6
    libXext.so.6
    libXrender.so.1
    libICE.so.6
    libSM.so.6
    libGL.so.1
    libgobject-2.0.so.0
    libgthread-2.0.so.0
    libglib-2.0.so.0

On Debian-based systems, these libraries are provided by the packages ::

    libncurses5 libgcc1 libstdc++6 libc6 libx11-6 libxext6
    libxrender1 libice6 libsm6 libgl1-mesa-glx libglib2.0-0

On RPM-based systems, these libraries are provided by the packages ::

    ncurses libgcc libstdc++ glibc libXext libXrender
    libICE libSM mesa-libGL glib2

This list was compiled by checking the external shared library dependencies of
the Canopy [1]_ and Anaconda [2]_ distributions, which both include a wide array
of the most popular Python modules and have been confirmed in practice to work
across a wide swath of Linux systems in the wild.

For dependencies on externally-provided versioned symbols in the above shared
libraries, the following symbol versions are permitted: ::

    GLIBC <= 2.5
    CXXABI <= 3.4.8
    GLIBCXX <= 3.4.9
    GCC <= 4.2.0

These symbol versions were determined by inspecting the latest symbol version
provided in the libraries distributed with CentOS 5, a Linux distribution
released in April 2007. In practice, this means that Python wheels which conform
to this policy should function on almost any linux distribution released after
this date.


Compilation and Tooling
=======================

To support the compilation of wheels meeting the ``manylinux1`` standard, we
provide initial drafts of two tools.

The first is a Docker image based on CentOS 5.11, which is recommended as an
easy to use self-contained build box for compiling ``manylinux1`` wheels [4]_.
Compiling on a more recently-released linux distribution will generally
introduce dependencies on too-new versioned symbols. The image comes with a
full compiler suite installed (``gcc``, ``g++``, and ``gfortran`` 4.8.2) as
well as the latest releases of Python and pip.

The second tool is a command line executable called ``auditwheel`` [5]_. First,
it inspects all of the ELF files inside a wheel to check for dependencies on
versioned symbols or external shared libraries, and verifies conformance with
the ``manylinux1`` policy. This includes the ability to add the new platform
tag to conforming wheels.

In addition, ``auditwheel`` has the ability to automatically modify wheels that
depend on external shared libraries by copying those shared libraries from
the system into the wheel itself, and modifying the appropriate RPATH entries
such that these libraries will be picked up at runtime. This accomplishes a
similar result as if the libraries had been statically linked without requiring
changes to the build system.

Neither of these tools are necessary to build wheels which conform with the
``manylinux1`` policy. Similar results can usually be achieved by statically
linking external dependencies and/or using certain inline assembly constructs
to instruct the linker to prefer older symbol versions, however these tricks
can be quite esoteric.


Platform Detection for Installers
=================================

Because the ``manylinux1`` profile is already known to work for the many
thousands of users of popular commercial Python distributions, we suggest that
installation tools like ``pip`` should error on the side of assuming that a
system *is* compatible, unless there is specific reason to think otherwise.

We know of three main sources of potential incompatibility that are likely to
arise in practice:

* A linux distribution that is too old (e.g. RHEL 4)
* A linux distribution that does not use glibc (e.g. Alpine Linux, which is
  based on musl libc, or Android)
* Eventually, in the future, there may exist distributions that break
  compatibility with this profile

To handle the first two cases, we propose the following simple and reliable
check: ::

    def have_glibc_version(major, minimum_minor):
        import ctypes

        process_namespace = ctypes.CDLL(None)
        try:
            gnu_get_libc_version = process_namespace.gnu_get_libc_version
        except AttributeError:
            # We are not linked to glibc.
            return False

        gnu_get_libc_version.restype = ctypes.c_char_p
        version_str = gnu_get_libc_version()
        # py2 / py3 compatibility:
        if not isinstance(version_str, str):
            version_str = version_str.decode("ascii")

        version = [int(piece) for piece in version_str.split(".")]
        assert len(version) == 2
        if major != version[0]:
            return False
        if minimum_minor > version[1]:
            return False
        return True

    # CentOS 5 uses glibc 2.5.
    is_manylinux1_compatible = have_glibc_version(2, 5)

To handle the third case, we propose the creation of a file
``/etc/python/compatibility.cfg`` in ConfigParser format, with sample
contents: ::

   [manylinux1]
   compatible = true

where the supported values for the ``manylinux1.compatible`` entry are the
same as those supported by the ConfigParser ``getboolean`` method.

The proposed logic for ``pip`` or related tools, then, is:

0) If ``distutils.util.get_platform()`` does not start with the string
   ``"linux"``, then assume the current system is not ``manylinux1``
   compatible.
1) If ``/etc/python/compatibility.conf`` exists and contains a ``manylinux1``
   key, then trust that.
2) Otherwise, if ``have_glibc_version(2, 5)`` returns true, then assume the
   current system can handle ``manylinux1`` wheels.
3) Otherwise, assume that the current system cannot handle ``manylinux1``
   wheels.


Security Implications
=====================

One of the advantages of dependencies on centralized libraries in Linux is
that bugfixes and security updates can be deployed system-wide, and
applications which depend on on these libraries will automatically feel the
effects of these patches when the underlying libraries are updated. This can
be particularly important for security updates in packages communication
across the network or cryptography.

``manylinux1`` wheels distributed through PyPI that bundle security-critical
libraries like OpenSSL will thus assume responsibility for prompt updates in
response disclosed vulnerabilities and patches. This closely parallels the
security implications of the distribution of binary wheels on Windows that,
because the platform lacks a system package manager, generally bundle their
dependencies. In particular, because its lacks a stable ABI, OpenSSL cannot be
included in the ``manylinux1`` profile.


Rejected Alternatives
=====================

One alternative would be to provide separate platform tags for each Linux
distribution (and each version thereof), e.g. ``RHEL6``, ``ubuntu14_10``,
``debian_jessie``, etc. Nothing in this proposal rules out the possibility of
adding such platform tags in the future, or of further extensions to wheel
metadata that would allow wheels to declare dependencies on external
system-installed packages. However, such extensions would require substantially
more work than this proposal, and still might not be appreciated by package
developers who would prefer not to have to maintain multiple build environments
and build multiple wheels in order to cover all the common Linux distributions.
Therefore we consider such proposals to be out-of-scope for this PEP.


References
==========

.. [1] PEP 425 -- Compatibility Tags for Built Distributions
   (https://www.python.org/dev/peps/pep-0425/)
.. [2] Enthought Canopy Python Distribution
   (https://store.enthought.com/downloads/)
.. [3] Continuum Analytics Anaconda Python Distribution
   (https://www.continuum.io/downloads)
.. [4] manylinux1 docker image
   (https://quay.io/repository/manylinux/manylinux)
.. [5] auditwheel
   (https://pypi.python.org/pypi/auditwheel)

Copyright
=========

This document has been placed into the public domain.

..

   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:


-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the Distutils-SIG mailing list