[Distutils] Does anyone understand what's going on with libpython on Linux?
njs at pobox.com
Sun Feb 7 03:01:11 EST 2016
So we found another variation between how different distros build
CPython , and I'm very confused.
Fedora (for example) turns out to work the way I naively expected:
taking py27 as our example, they have:
- libpython2.7.so.1.0 contains the actual python runtime
- /usr/bin/python2.7 is a tiny (~7 KiB) executable that links to
libpython2.7.so.1 to do the actual work; the main python package
depends on the libpython package
- python extension module packages depend on the libpython package,
and contain extension modules linked against libpython2.7.so.1
- python extension modules compiled locally get linked against
libpython2.7.so.1 by default
Debian/Ubuntu do things differently:
- libpython2.7.so.1.0 exists and contains the full python runtime, but
is not installed by default
- /usr/bin/python2.7 *also* contains a *second* copy of the full
python runtime; there is no dependency relationship between these, and
you don't even get libpython2.7.so.1.0 installed unless you explicitly
request it or it gets pulled in through some other dependency
- most python extension module packages do *not* depend on the
libpython2.7 package, and contain extension modules that are *not*
linked against libpython2.7.so.1.0 (but there are exceptions!)
- python extension modules compiled locally do *not* get linked
against libpython2.7.so.1 by default.
The only things that seem to link against libpython2.7.so.1.0 in debian are:
a) other packages that embed python (e.g. gnucash, paraview, perf, ...)
b) some minority of python packages (e.g. the PySide/QtOpenGL.so
module is one that I found that directly links to libpython2.7.so.1.0)
I guess that the reason this works is that according to ELF linking
rules, the symbols defined in the main executable, or in the
transitive closure of the libraries that the main executable is linked
to via DT_NEEDED entries, are all injected into the global scope of
any dlopen'ed libraries.
Uh, let me try saying that again.
When you dlopen() a library -- like, for example, a python extension
module -- then the extension automatically gets access to any symbols
that are exported from either (a) the main executable itself, or (b)
any of the libraries that are listed if you run 'ldd <the main
executable>'. It also gets access to any symbols that are exported by
itself, or any of the libraries listed if you run 'ldd <the dlopen'ed
library>'. OTOH it does *not* get access to any symbols exported by
other libraries that get dlopen'ed -- each dlopen specifically creates
its own "scope".
So the reason this works is that Debian's /usr/bin/python2.7 itself
exports all the standard Python C ABI symbols, so any extension module
that it loads automatically get access to the CPython ABI, even if
they don't explicitly link to it. And programs like gnucash are linked
directly to libpython2.7.so.1, so they also end up exporting the
CPython ABI to any libraries that they dlopen.
But, it seems to me that there are two problems with the Debian/Ubuntu
way of doing things:
1) it's rather wasteful of space, since there are two complete
independent copies of the whole CPython runtime (one inside
/usr/bin/python2.7, the other inside libpython2.7.so.1).
2) if I ever embed cpython by doing dlopen("libpython2.7.so.1"), or
dlopen("some_plugin_library_linked_to_libpython.so"), then the
embedded cpython will not be able to load python extensions that are
compiled in the Debian-style (but will be able to load python
extensions compiled in the Fedora-style), because the dlopen() the
loaded the python runtime and the dlopen() that loads the extension
module create two different scopes that can't see each other's
symbols. [I'm pretty sure this is right, but linking is arcane and
probably I should write some tests to double check.]
I guess (2) might be why some of Debian's extension modules do link to
libpython2.7.so.1 directly? Or maybe that's just a bug?
Is there any positive reason in favor of the Debian style approach?
Clearly someone put some work into setting things up this way, so
there must be some motivation, but I'm not sure what it is?
The immediate problem for us is that if a manylinux1 wheel links to
libpythonX.Y.so (Fedora-style), and then it gets run on a Debian
system that doesn't have libpythonX.Y.so installed, it will crash
ImportError: libpython2.7.so.1.0: cannot open shared object file: No
such file or directory
Maybe this is okay and the solution is to tell people that they need
to 'apt install libpython2.7'. In a sense this isn't even a
regression, because every system that is capable of installing a
binary extension from an sdist has python2.7-dev installed, which
depends on libpython2.7 --> therefore every system that used to be
able to do 'pip install somebinary' with sdists will still be able to
do it with manylinux1 builds.
The alternative is to declare that manylinux1 extensions should not
link to libpython. This should I believe work fine on both
Debian-style and Fedora-style installations -- though the PySide
example, and the theoretical issue with embedding python through
dlopen, both give me some pause.
Two more questions:
- What are Debian/Ubuntu doing in distutils so that extensions don't
link to libpython by default? If we do go with the option of saying
that manylinux extensions shouldn't link to libpython, then that's
something auditwheel *can* fix up, but it'd be even nicer if we could
set up the docker image to get it right in the first place.
- Can/should Debian/Ubuntu switch to the Fedora model? Obviously it
would take quite some time before a generic platform like manylinux
could assume that this had happened, but it does seem better to me...?
And if it's going to happen at all it might be nice to get the switch
into 16.04 LTS? Of course that's probably ambitious, even if I'm not
missing some reason why the Debian/Ubuntu model is actually
Nathaniel J. Smith -- https://vorpus.org
More information about the Distutils-SIG