Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

Hi all, I went ahead and tried to collect a list of all of the libraries that could be considered to constitute the "base" system for linux-64. The strategy I used was to leverage off the work done by the folks at Continuum by searching through their pre-compiled binaries from https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries that were dependened on (according to ldd) that were not accounted for by the declared dependencies that each package made known to the conda package manager. The full list of these system libraries, sorted in from most-commonly-depend-on to rarest, is below. There are 158 of them. ['linux-vdso.so.1', 'libc.so.6', 'libpthread.so.0', 'libm.so.6', 'libdl.so.2', 'libutil.so.1', 'libgcc_s.so.1', 'libstdc++.so.6', 'libexpat.so.1', 'librt.so.1', 'libpng12.so.0', 'libcrypt.so.1', 'libffi.so.6', 'libresolv.so.2', 'libkeyutils.so.1', 'libcom_err.so.2', 'libp11-kit.so.0', 'libkrb5.so.26', 'libheimntlm.so.0', 'libtasn1.so.6', 'libheimbase.so.1', 'libgssapi.so.3', 'libroken.so.18', 'libhcrypto.so.4', 'libhogweed.so.4', 'libnettle.so.6', 'libhx509.so.5', 'libwind.so.0', 'libgnutls-deb0.so.28', 'libasn1.so.8', 'libgmp.so.10', 'libsasl2.so.2', 'libidn.so.11', 'librtmp.so.1', 'liblber-2.4.so.2', 'libldap_r-2.4.so.2', 'libXdmcp.so.6', 'libX11.so.6', 'libXau.so.6', 'libxcb.so.1', 'libgssapi_krb5.so.2', 'libkrb5.so.3', 'libk5crypto.so.3', 'libkrb5support.so.0', 'libicudata.so.55', 'libicuuc.so.55', 'libhdf5_serial.so.10', 'libcurl-gnutls.so.4', 'libhdf5_serial_hl.so.10', 'libtinfo.so.5', 'libgcrypt.so.20', 'libgpg-error.so.0', 'libnsl.so.1', 'libXext.so.6', 'libncursesw.so.5', 'libpanelw.so.5', 'libXrender.so.1', 'libjbig.so.0', 'libpcre.so.3', 'libglib-2.0.so.0', 'libnvidia-tls.so.352.41', 'libnvidia-glcore.so.352.41', 'libGL.so.1', 'libuuid.so.1', 'libSM.so.6', 'libICE.so.6', 'libgobject-2.0.so.0', 'libgfortran.so.1', 'liblzma.so.5', 'libXt.so.6', 'libgmodule-2.0.so.0', 'libXi.so.6', 'libgstpbutils-1.0.so.0', 'liborc-0.4.so.0', 'libgstreamer-1.0.so.0', 'libgsttag-1.0.so.0', 'libgstvideo-1.0.so.0', 'libxslt.so.1', 'libaudio.so.2', 'libjpeg.so.8', 'libgstaudio-1.0.so.0', 'libgstbase-1.0.so.0', 'libgstapp-1.0.so.0', 'libz.so.1', 'libgthread-2.0.so.0', 'libfreetype.so.6', 'libfontconfig.so.1', 'libdbus-1.so.3', 'libsystemd.so.0', 'libltdl.so.7', 'libGLU.so.1', 'libsqlite3.so.0', 'libpgm-5.1.so.0', 'libgomp.so.1', 'libxcb-render.so.0', 'libxcb-shm.so.0', 'libncurses.so.5', 'libxml2.so.2', 'libXss.so.1', 'libXft.so.2', 'libtk.so', 'libtcl.so', 'libasound.so.2', 'libharfbuzz.so.0', 'libpixman-1.so.0', 'libgio-2.0.so.0', 'libXinerama.so.1', 'libselinux.so.1', 'libXcomposite.so.1', 'libthai.so.0', 'libXdamage.so.1', 'libgdk-x11-2.0.so.0', 'libpangoft2-1.0.so.0', 'libcairo.so.2', 'libpangocairo-1.0.so.0', 'libdatrie.so.1', 'libatk-1.0.so.0', 'libXcursor.so.1', 'libXfixes.so.3', 'libgraphite2.so.3', 'libgdk_pixbuf-2.0.so.0', 'libgtk-x11-2.0.so.0', 'libquadmath.so.0', 'libpango-1.0.so.0', 'libXrandr.so.2', 'libgfortran.so.3', 'libjson-c.so.2', 'libshiboken-python2.7.so.1.1', 'libogg.so.0', 'libvorbis.so.0', 'libatlas.so.3', 'libcurl.so.4', 'libhdf5.so.9', 'libodbcinst.so.1', 'libpcap.so.0.9', 'libnetcdf.so.7', 'libblas.so.3', 'libpulse.so.0', 'libcaca.so.0', 'libgstreamer-0.10.so.0', 'libXxf86vm.so.1', 'libhdf5_hl.so.9', 'libpulse-simple.so.0', 'libasyncns.so.0', 'libwrap.so.0', 'libvorbisenc.so.2', 'libmagic.so.1', 'libssl.so.1.0.0', 'libFLAC.so.8', 'libSDL-1.2.so.0', 'libsndfile.so.1', 'libslang.so.2', 'libglapi.so.0', 'libaio.so.1', 'libgstinterfaces-0.10.so.0', 'libpulsecommon-6.0.so', 'libjpeg.so.62', 'libcrypto.so.1.0.0'] This list actually contains a fair number of false positives, so it would need to be pruned manually. If you stare at it a little while, you might see some libraries in there that you recognize that shouldn't be part of the base system, like libatlas.so.3. This gist https://gist.github.com/rmcgibbo/a13e7623c38ec54fcc93 contains some more detailed data -- for each of libraries in the list above, it gives a list of names of the packages that depend on this library. For example, for libatlas.so.3, the there is only a single package which depends on it, ["scikit-learn-0.11-np16py27_ce0"]. So, probably a bug. "libgfortran.so.1" is also in the list. It's depended on by ["cvxopt-1.1.6-py27_0", "cvxopt-1.1.7-py27_0", "cvxopt-1.1.7-py34_0", "cvxopt-1.1.7-py35_0", "numpy-1.5.1-py27_1", "numpy-1.5.1-py27_3", "numpy-1.5.1-py27_4", "numpy-1.5.1-py27_ce0", "numpy-1.6.2-py27_1", "numpy-1.6.2-py27_3", "numpy-1.6.2-py27_4", "numpy-1.6.2-py27_ce0", "numpy-1.7.0-py27_0", "numpy-1.7.0b2-py27_ce0", "numpy-1.7.0rc1-py27_0", "numpy-1.7.1-py27_0", "numpy-1.7.1-py27_2", "numpy-1.8.0-py27_0", "numpy-1.8.1-py27_0", "numpy-1.8.1-py34_0", "numpy-1.8.2-py27_0", "numpy-1.8.2-py34_0", "numpy-1.9.0-py27_0", "numpy-1.9.0-py34_0", "numpy-1.9.1-py27_0", "numpy-1.9.1-py34_0", "numpy-1.9.2-py27_0", "numpy-1.9.2-py34_0"]. Note that this list of numpy versions doesn't include the latest ones -- all of the numpy-1.10 binaries made by Continuum pick up libgfortan from a conda package and don't depend on it being provided by the system. Also, the final '_0' or '_1' segment of many of these package names is the build number, which is to make a new release of the same release of a package, usually because of a packaging problem. So many of these packages were probably built incorrectly and superseded by new builds with a higher build number. So it's not perfect. But it might be a useful starting place. -Robert

On 09.01.2016 12:52, Robert McGibbon wrote:
Hi all,
I went ahead and tried to collect a list of all of the libraries that could be considered to constitute the "base" system for linux-64. The strategy I used was to leverage off the work done by the folks at Continuum by searching through their pre-compiled binaries from https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries that were dependened on (according to ldd) that were not accounted for by the declared dependencies that each package made known to the conda package manager.
do those packages use ld --as-needed for linking? there are a lot libraries in that list that I highly doubt are directly used by the packages.

do those packages use ld --as-needed for linking?
Is it possible to check this? I mean, there are over 7000 packages that I check. I don't know how they were all built. It's totally possible for many of them to be unused. A reasonably common thing might be that packages use ctypes or dlopen to dynamically load shared libraries that are actually just optional (and catch the error and recover gracefully if the library can't be loaded). -Robert On Sat, Jan 9, 2016 at 4:20 AM, Julian Taylor <jtaylor.debian@googlemail.com
wrote:
On 09.01.2016 12:52, Robert McGibbon wrote:
Hi all,
I went ahead and tried to collect a list of all of the libraries that could be considered to constitute the "base" system for linux-64. The strategy I used was to leverage off the work done by the folks at Continuum by searching through their pre-compiled binaries from https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries that were dependened on (according to ldd) that were not accounted for by the declared dependencies that each package made known to the conda package manager.
do those packages use ld --as-needed for linking? there are a lot libraries in that list that I highly doubt are directly used by the packages.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sat, Jan 9, 2016 at 12:20 PM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On 09.01.2016 12:52, Robert McGibbon wrote:
Hi all,
I went ahead and tried to collect a list of all of the libraries that could be considered to constitute the "base" system for linux-64. The strategy I used was to leverage off the work done by the folks at Continuum by searching through their pre-compiled binaries from https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries that were dependened on (according to ldd) that were not accounted for by the declared dependencies that each package made known to the conda package manager.
do those packages use ld --as-needed for linking? there are a lot libraries in that list that I highly doubt are directly used by the packages.
It is also a common problem when building packages without using a "clean" build environment, as it is too easy to pick up dependencies accidentally, especially for autotools-based packages (unless one uses pbuilder or similar tools). David

On Sat, Jan 9, 2016 at 3:52 AM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
Hi all,
I went ahead and tried to collect a list of all of the libraries that could be considered to constitute the "base" system for linux-64. The strategy I used was to leverage off the work done by the folks at Continuum by searching through their pre-compiled binaries from https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries that were dependened on (according to ldd) that were not accounted for by the declared dependencies that each package made known to the conda package manager.
The full list of these system libraries, sorted in from most-commonly-depend-on to rarest, is below. There are 158 of them. [...] So it's not perfect. But it might be a useful starting place.
Unfortunately, yeah, it looks like there's a lot of false positives in here :-(. For example your list contains liblzma and libsqlite, but both of these are shipped as dependencies of python itself. So probably someone just forgot to declare the dependency explicitly, but got away with it because the libraries were pulled in anyway. Maybe a better approach would be to look at what libraries are used on by an up-to-date default Anaconda install (on the assumption that this is the best tested configuration), and then erase from the list all libraries that are shipped by this configuration (ignoring declared dependencies since those seem to be unreliable)? It's better to be conservative here, since the end goal is to come up with a list of external libraries that we're confident have actually been tested for compatibility by lots and lots of different users. -n -- Nathaniel J. Smith -- http://vorpus.org

Maybe a better approach would be to look at what libraries are used on by an up-to-date default Anaconda install (on the assumption that this is the best tested configuration)
That's not a bad idea. I also have a couple other ideas about how to filter this based on using debian popularity-contests and the package graph. I will report back when I have more info. -Robert On Sat, Jan 9, 2016 at 3:04 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
I went ahead and tried to collect a list of all of the libraries that could be considered to constitute the "base" system for linux-64. The strategy I used was to leverage off the work done by the folks at Continuum by searching through their pre-compiled binaries from https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries
On Sat, Jan 9, 2016 at 3:52 AM, Robert McGibbon <rmcgibbo@gmail.com> wrote: that
were dependened on (according to ldd) that were not accounted for by the declared dependencies that each package made known to the conda package manager.
The full list of these system libraries, sorted in from most-commonly-depend-on to rarest, is below. There are 158 of them. [...] So it's not perfect. But it might be a useful starting place.
Unfortunately, yeah, it looks like there's a lot of false positives in here :-(. For example your list contains liblzma and libsqlite, but both of these are shipped as dependencies of python itself. So probably someone just forgot to declare the dependency explicitly, but got away with it because the libraries were pulled in anyway.
Maybe a better approach would be to look at what libraries are used on by an up-to-date default Anaconda install (on the assumption that this is the best tested configuration), and then erase from the list all libraries that are shipped by this configuration (ignoring declared dependencies since those seem to be unreliable)? It's better to be conservative here, since the end goal is to come up with a list of external libraries that we're confident have actually been tested for compatibility by lots and lots of different users.
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi all, I followed Nathaniel's advice and restricted the search down to the packages included in the Anaconda release (as opposed to all of the packages in their repositories), and fixed some technical issues with the way I was doing the analysis. The new list is much smaller. Here are the shared libraries that the components of Anaconda require that the system provides on Linux 64: libpanelw.so.5, libncursesw.so.5, libgcc_s.so.1, libstdc++.so.6, libm.so.6, libdl.so.2, librt.so.1, libcrypt.so.1, libc.so.6, libnsl.so.1, libutil.so.1, libpthread.so.0, libX11.so.6, libXext.so.6, libgobject-2.0.so.0, libgthread-2.0.so.0, libglib-2.0.so.0, libXrender.so.1, libICE.so.6, libSM.so.6, libGL.so.1. Many of these libraries are required simply for the interpreter. The remaining ones that aren't required by the interpreter are, but are required by some other package in Anaconda are: libgcc_s.so.1, libstdc++.so.6, libXext.so.6, libSM.so.6, libgthread-2.0.so.0, libgobject-2.0.so.0, libglib-2.0.so.0, libICE.so.6, libXrender.so.1, and libGL.so.1. Most of these are parts of X11 required by Qt ( http://doc.qt.io/qt-5/linux-requirements.html). -Robert On Sat, Jan 9, 2016 at 4:42 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
Maybe a better approach would be to look at what libraries are used on by an up-to-date default Anaconda install (on the assumption that this is the best tested configuration)
That's not a bad idea. I also have a couple other ideas about how to filter this based on using debian popularity-contests and the package graph. I will report back when I have more info.
-Robert
On Sat, Jan 9, 2016 at 3:04 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
I went ahead and tried to collect a list of all of the libraries that could be considered to constitute the "base" system for linux-64. The strategy I used was to leverage off the work done by the folks at Continuum by searching through their pre-compiled binaries from https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries
were dependened on (according to ldd) that were not accounted for by
On Sat, Jan 9, 2016 at 3:52 AM, Robert McGibbon <rmcgibbo@gmail.com> wrote: that the
declared dependencies that each package made known to the conda package manager.
The full list of these system libraries, sorted in from most-commonly-depend-on to rarest, is below. There are 158 of them. [...] So it's not perfect. But it might be a useful starting place.
Unfortunately, yeah, it looks like there's a lot of false positives in here :-(. For example your list contains liblzma and libsqlite, but both of these are shipped as dependencies of python itself. So probably someone just forgot to declare the dependency explicitly, but got away with it because the libraries were pulled in anyway.
Maybe a better approach would be to look at what libraries are used on by an up-to-date default Anaconda install (on the assumption that this is the best tested configuration), and then erase from the list all libraries that are shipped by this configuration (ignoring declared dependencies since those seem to be unreliable)? It's better to be conservative here, since the end goal is to come up with a list of external libraries that we're confident have actually been tested for compatibility by lots and lots of different users.
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

I started working on a tool for checking linux wheels for "manylinux" compatibility, and fixing them up if possible, based on the same ideas as Matthew Brett's delocate <https://github.com/matthew-brett/delocate> for OS X. Current WIP code, if anyone wants to help / throw penuts, is here: https://github.com/rmcgibbo/deloc8. It's currently fairly modest and can only list non-whitelisted external shared library dependencies, and verify that sufficiently old versioned symbols for glibc and its ilk are used. -Robert On Sun, Jan 10, 2016 at 1:19 AM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
Hi all,
I followed Nathaniel's advice and restricted the search down to the packages included in the Anaconda release (as opposed to all of the packages in their repositories), and fixed some technical issues with the way I was doing the analysis.
The new list is much smaller. Here are the shared libraries that the components of Anaconda require that the system provides on Linux 64:
libpanelw.so.5, libncursesw.so.5, libgcc_s.so.1, libstdc++.so.6, libm.so.6, libdl.so.2, librt.so.1, libcrypt.so.1, libc.so.6, libnsl.so.1, libutil.so.1, libpthread.so.0, libX11.so.6, libXext.so.6, libgobject-2.0.so.0, libgthread-2.0.so.0, libglib-2.0.so.0, libXrender.so.1, libICE.so.6, libSM.so.6, libGL.so.1.
Many of these libraries are required simply for the interpreter. The remaining ones that aren't required by the interpreter are, but are required by some other package in Anaconda are:
libgcc_s.so.1, libstdc++.so.6, libXext.so.6, libSM.so.6, libgthread-2.0.so.0, libgobject-2.0.so.0, libglib-2.0.so.0, libICE.so.6, libXrender.so.1, and libGL.so.1.
Most of these are parts of X11 required by Qt ( http://doc.qt.io/qt-5/linux-requirements.html).
-Robert
On Sat, Jan 9, 2016 at 4:42 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
Maybe a better approach would be to look at what libraries are used on by an up-to-date default Anaconda install (on the assumption that this is the best tested configuration)
That's not a bad idea. I also have a couple other ideas about how to filter this based on using debian popularity-contests and the package graph. I will report back when I have more info.
-Robert
On Sat, Jan 9, 2016 at 3:04 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
I went ahead and tried to collect a list of all of the libraries that could be considered to constitute the "base" system for linux-64. The strategy I used was to leverage off the work done by the folks at Continuum by searching through their pre-compiled binaries from https://repo.continuum.io/pkgs/free/linux-64/ to find shared
were dependened on (according to ldd) that were not accounted for by
On Sat, Jan 9, 2016 at 3:52 AM, Robert McGibbon <rmcgibbo@gmail.com> wrote: libraries that the
declared dependencies that each package made known to the conda package manager.
The full list of these system libraries, sorted in from most-commonly-depend-on to rarest, is below. There are 158 of them. [...] So it's not perfect. But it might be a useful starting place.
Unfortunately, yeah, it looks like there's a lot of false positives in here :-(. For example your list contains liblzma and libsqlite, but both of these are shipped as dependencies of python itself. So probably someone just forgot to declare the dependency explicitly, but got away with it because the libraries were pulled in anyway.
Maybe a better approach would be to look at what libraries are used on by an up-to-date default Anaconda install (on the assumption that this is the best tested configuration), and then erase from the list all libraries that are shipped by this configuration (ignoring declared dependencies since those seem to be unreliable)? It's better to be conservative here, since the end goal is to come up with a list of external libraries that we're confident have actually been tested for compatibility by lots and lots of different users.
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Mon, Jan 11, 2016 at 6:06 AM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
I started working on a tool for checking linux wheels for "manylinux" compatibility, and fixing them up if possible, based on the same ideas as Matthew Brett's delocate for OS X. Current WIP code, if anyone wants to help / throw penuts, is here: https://github.com/rmcgibbo/deloc8.
It's currently fairly modest and can only list non-whitelisted external shared library dependencies, and verify that sufficiently old versioned symbols for glibc and its ilk are used.
That is super cool! and also this week David C. @ Enthought contributed the docker image that they use to actually make compatible builds, so I guess we have some momentum; let's make this happen :-). I just made a repo and a mailing list to continue the discussion... https://github.com/manylinux/manylinux https://groups.google.com/forum/#!forum/manylinux-discuss -n -- Nathaniel J. Smith -- http://vorpus.org
participants (4)
-
David Cournapeau
-
Julian Taylor
-
Nathaniel Smith
-
Robert McGibbon