PEP idea regarding non-glibc systems
Hi all, I have an idea regarding Python binary wheels on non-glibc platforms, and it seems that initially I've posted it to the wrong list ([1]) Long story short, the proposal is to use platform tuples (like compiler ones) for wheel names, which will allow much broader platform support, for example: package-1.0-cp36-cp36m-amd64_linux_gnu.whl package-1.0-cp36-cp36m-amd64_linux_musl.whl So eventually only {platform tag} part will be modified. Glibc/musl detection is quite trivial and eventually will be based on existing one in PEP 513 [2]. Let me know what you think. Best regards, Alex [1] https://mail.python.org/pipermail/python-list/2019-February/739524.html [2] https://www.python.org/dev/peps/pep-0513/#id49
On Tue, Feb 19, 2019 at 3:28 PM Alexander Revin <lyssdod@gmail.com> wrote:
Hi all,
I have an idea regarding Python binary wheels on non-glibc platforms, and it seems that initially I've posted it to the wrong list ([1])
Long story short, the proposal is to use platform tuples (like compiler ones) for wheel names, which will allow much broader platform support, for example:
package-1.0-cp36-cp36m-amd64_linux_gnu.whl package-1.0-cp36-cp36m-amd64_linux_musl.whl
So eventually only {platform tag} part will be modified. Glibc/musl detection is quite trivial and eventually will be based on existing one in PEP 513 [2].
The challenge here is: the purpose of a target triple is to tell a compiler/linker toolchain which kind of code they should generate, e.g. when cross-compiling. The purpose of a wheel tag is to tell you whether a given wheel will work on a given system. It turns out these are different things :-). For example, Ubuntu 18.10 and RHEL 6 are both 'amd64-linux-gnu', because they use the same instruction set, the same binary format (ELF), etc. But if you build a wheel on Ubuntu 18.10, it definitely will not work on RHEL 6. (The other way around might work, if you do other things right.) In practice Windows and macOS are already fine; the place where this would be useful is Linux wheels for platforms that use non-Intel-based architectures or non-glibc-libcs. We do have an idea for making it easier to support newer glibcs and also extending to all architectures: https://mail.python.org/archives/list/distutils-sig@python.org/thread/6AFS4H... Adding musl is a bit trickier since I'm not sure what the details of their ABI compatibility are, and they intentionally make it difficult to figure out whether you're running on musl. But if someone could convince them to publish more information then we could fix that too. -n -- Nathaniel J. Smith -- https://vorpus.org
Hi Nathaniel, Thanks for your answer. Basing on your example of RHEL and Ubuntu, let's take RHEL 6 which uses glibc 2.12. If you cross-compile for it (using the same gcc RHEL uses), wheel surely will work on Ubuntu 18.10 :) I think it's not of an issue, since wheels are built with these minimal runtime requirements anyway, unless they're built on a local machine – but in this case they will just work™; anyway, having a working toolchain is not the scope of Python tooling. Gentoo has an awesome crossdev tool for creating cross-toolchains, and there's crosstool-ng of course. It looks like there are workarounds for putting code built on newer systems to older ones though, but they seem to be pretty tedious ([1]) Speaking of runtime detection, I see it this way for example – since one of the most reliable ways to check for program's dependencies is invoking something like ldd or objdump, it can essentially be done the same way: 1. Pick the minimal required code to extract ELF's ".interp" field [2] (I used code from [3]); 2. Process sys.executable with it; Here what it returns (grepping by "/lib" because it starts from a new line): Gentoo amd64 glibc # python3 readelf.py $(which python3) | grep "/lib" b'/lib64/ld-linux-x86-64.so.2\x00' Alpine amd64 docker (official python3 alpine image): # python3 readelf.py $(which python3) | grep "/lib" b'/lib/ld-musl-x86_64.so.1\x00' 3. Essentially that's enough in my opinion, but we can go further and do what ldd does: # /lib/ld-musl-x86_64.so.1 --list $(which python3) /lib/ld-musl-x86_64.so.1 (0x7fbc36567000) libpython3.7m.so.1.0 => /usr/local/lib/libpython3.7m.so.1.0 (0x7fbc3622a000) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fbc36567000) Basically it's just string matching here, and the only question now if the name of dynamic linker is enough or all libs should be iterated until perfect "musl" or "libc" match. Parsing "Dynamic section" turns out to be pretty useless – it's empty on Alpine (or parsing code is buggy). If ".interp" field is not available, then interpreter is statically linked :) 4. If any glibc-specific functionality is needed at this point, code from PEP 513 is really good. Maybe it's also better to put it first and use ELF parsing if it failed to open glibc in the first place. Thanks, Alex [1] https://snorfalorpagus.net/blog/2016/07/17/compiling-python-extensions-for-o... [2] https://www.linuxjournal.com/article/1060 [3] https://github.com/detailyang/readelf/blob/master/readelf/readelf.py#L545 On Wed, Feb 20, 2019 at 1:50 AM Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Feb 19, 2019 at 3:28 PM Alexander Revin <lyssdod@gmail.com> wrote:
Hi all,
I have an idea regarding Python binary wheels on non-glibc platforms, and it seems that initially I've posted it to the wrong list ([1])
Long story short, the proposal is to use platform tuples (like compiler ones) for wheel names, which will allow much broader platform support, for example:
package-1.0-cp36-cp36m-amd64_linux_gnu.whl package-1.0-cp36-cp36m-amd64_linux_musl.whl
So eventually only {platform tag} part will be modified. Glibc/musl detection is quite trivial and eventually will be based on existing one in PEP 513 [2].
The challenge here is: the purpose of a target triple is to tell a compiler/linker toolchain which kind of code they should generate, e.g. when cross-compiling. The purpose of a wheel tag is to tell you whether a given wheel will work on a given system. It turns out these are different things :-).
For example, Ubuntu 18.10 and RHEL 6 are both 'amd64-linux-gnu', because they use the same instruction set, the same binary format (ELF), etc. But if you build a wheel on Ubuntu 18.10, it definitely will not work on RHEL 6. (The other way around might work, if you do other things right.)
In practice Windows and macOS are already fine; the place where this would be useful is Linux wheels for platforms that use non-Intel-based architectures or non-glibc-libcs. We do have an idea for making it easier to support newer glibcs and also extending to all architectures: https://mail.python.org/archives/list/distutils-sig@python.org/thread/6AFS4H...
Adding musl is a bit trickier since I'm not sure what the details of their ABI compatibility are, and they intentionally make it difficult to figure out whether you're running on musl. But if someone could convince them to publish more information then we could fix that too.
-n
-- Nathaniel J. Smith -- https://vorpus.org
I've put combined code here: https://gist.github.com/lyssdod/f51579ae8d93c8657a5564aefc2ffbca Just download it, make executable and run. amd64 Alpine: # ./guess_pyruntime.py Interpreter extracted: /lib/ld-musl-x86_64.so.1 Running on musl amd64 Gentoo glibc: # ./guess_pyruntime.py Interpreter extracted: /lib64/ld-linux-x86-64.so.2 Running on glibc version 2.27 On Thu, Feb 21, 2019 at 3:57 AM Alexander Revin <lyssdod@gmail.com> wrote:
Hi Nathaniel,
Thanks for your answer.
Basing on your example of RHEL and Ubuntu, let's take RHEL 6 which uses glibc 2.12. If you cross-compile for it (using the same gcc RHEL uses), wheel surely will work on Ubuntu 18.10 :) I think it's not of an issue, since wheels are built with these minimal runtime requirements anyway, unless they're built on a local machine – but in this case they will just work™; anyway, having a working toolchain is not the scope of Python tooling. Gentoo has an awesome crossdev tool for creating cross-toolchains, and there's crosstool-ng of course. It looks like there are workarounds for putting code built on newer systems to older ones though, but they seem to be pretty tedious ([1])
Speaking of runtime detection, I see it this way for example – since one of the most reliable ways to check for program's dependencies is invoking something like ldd or objdump, it can essentially be done the same way:
1. Pick the minimal required code to extract ELF's ".interp" field [2] (I used code from [3]); 2. Process sys.executable with it;
Here what it returns (grepping by "/lib" because it starts from a new line):
Gentoo amd64 glibc # python3 readelf.py $(which python3) | grep "/lib" b'/lib64/ld-linux-x86-64.so.2\x00'
Alpine amd64 docker (official python3 alpine image): # python3 readelf.py $(which python3) | grep "/lib" b'/lib/ld-musl-x86_64.so.1\x00'
3. Essentially that's enough in my opinion, but we can go further and do what ldd does:
# /lib/ld-musl-x86_64.so.1 --list $(which python3) /lib/ld-musl-x86_64.so.1 (0x7fbc36567000) libpython3.7m.so.1.0 => /usr/local/lib/libpython3.7m.so.1.0 (0x7fbc3622a000) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fbc36567000)
Basically it's just string matching here, and the only question now if the name of dynamic linker is enough or all libs should be iterated until perfect "musl" or "libc" match. Parsing "Dynamic section" turns out to be pretty useless – it's empty on Alpine (or parsing code is buggy). If ".interp" field is not available, then interpreter is statically linked :)
4. If any glibc-specific functionality is needed at this point, code from PEP 513 is really good. Maybe it's also better to put it first and use ELF parsing if it failed to open glibc in the first place.
Thanks, Alex
[1] https://snorfalorpagus.net/blog/2016/07/17/compiling-python-extensions-for-o... [2] https://www.linuxjournal.com/article/1060 [3] https://github.com/detailyang/readelf/blob/master/readelf/readelf.py#L545
On Wed, Feb 20, 2019 at 1:50 AM Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Feb 19, 2019 at 3:28 PM Alexander Revin <lyssdod@gmail.com> wrote:
Hi all,
I have an idea regarding Python binary wheels on non-glibc platforms, and it seems that initially I've posted it to the wrong list ([1])
Long story short, the proposal is to use platform tuples (like compiler ones) for wheel names, which will allow much broader platform support, for example:
package-1.0-cp36-cp36m-amd64_linux_gnu.whl package-1.0-cp36-cp36m-amd64_linux_musl.whl
So eventually only {platform tag} part will be modified. Glibc/musl detection is quite trivial and eventually will be based on existing one in PEP 513 [2].
The challenge here is: the purpose of a target triple is to tell a compiler/linker toolchain which kind of code they should generate, e.g. when cross-compiling. The purpose of a wheel tag is to tell you whether a given wheel will work on a given system. It turns out these are different things :-).
For example, Ubuntu 18.10 and RHEL 6 are both 'amd64-linux-gnu', because they use the same instruction set, the same binary format (ELF), etc. But if you build a wheel on Ubuntu 18.10, it definitely will not work on RHEL 6. (The other way around might work, if you do other things right.)
In practice Windows and macOS are already fine; the place where this would be useful is Linux wheels for platforms that use non-Intel-based architectures or non-glibc-libcs. We do have an idea for making it easier to support newer glibcs and also extending to all architectures: https://mail.python.org/archives/list/distutils-sig@python.org/thread/6AFS4H...
Adding musl is a bit trickier since I'm not sure what the details of their ABI compatibility are, and they intentionally make it difficult to figure out whether you're running on musl. But if someone could convince them to publish more information then we could fix that too.
-n
-- Nathaniel J. Smith -- https://vorpus.org
Sniffing out the ELF loader is definitely more complicated than ideal – e.g. it adds a "find the python binary" step that could go wrong – but, ok, if that were the only barrier maybe we could manage. The bigger problem is: how do we figure out whether a wheel built against *that* musl on *that* machine will work with *this* musl on *this* machine? For glibc, this involves three pieces, each of which is non-trivial: - the glibc maintainers provide some careful, documented guarantees about when a library built against one glibc version will run with another glibc version, and they encode this in machine-readable form in their symbol versions - auditwheel checks the symbol versions to derive a summary of what the wheel needs, and stores it the wheel metadata - pip checks the local system's glibc version against this metadata A simple "is this musl or not?" check isn't useful on its own. We also need some musl equivalent for this other machinery. (It doesn't have to work the same way, but it has to accomplish the same result.) If you want to keep moving this forward you're going to have to talk to the musl maintainers. -n On Mon, Feb 25, 2019, 09:49 Alexander Revin <lyssdod@gmail.com> wrote:
I've put combined code here: https://gist.github.com/lyssdod/f51579ae8d93c8657a5564aefc2ffbca
Just download it, make executable and run.
amd64 Alpine: # ./guess_pyruntime.py Interpreter extracted: /lib/ld-musl-x86_64.so.1 Running on musl
amd64 Gentoo glibc: # ./guess_pyruntime.py Interpreter extracted: /lib64/ld-linux-x86-64.so.2 Running on glibc version 2.27
On Thu, Feb 21, 2019 at 3:57 AM Alexander Revin <lyssdod@gmail.com> wrote:
Hi Nathaniel,
Thanks for your answer.
Basing on your example of RHEL and Ubuntu, let's take RHEL 6 which uses glibc 2.12. If you cross-compile for it (using the same gcc RHEL uses), wheel surely will work on Ubuntu 18.10 :) I think it's not of an issue, since wheels are built with these minimal runtime requirements anyway, unless they're built on a local machine – but in this case they will just work™; anyway, having a working toolchain is not the scope of Python tooling. Gentoo has an awesome crossdev tool for creating cross-toolchains, and there's crosstool-ng of course. It looks like there are workarounds for putting code built on newer systems to older ones though, but they seem to be pretty tedious ([1])
Speaking of runtime detection, I see it this way for example – since one of the most reliable ways to check for program's dependencies is invoking something like ldd or objdump, it can essentially be done the same way:
1. Pick the minimal required code to extract ELF's ".interp" field [2] (I used code from [3]); 2. Process sys.executable with it;
Here what it returns (grepping by "/lib" because it starts from a new
line):
Gentoo amd64 glibc # python3 readelf.py $(which python3) | grep "/lib" b'/lib64/ld-linux-x86-64.so.2\x00'
Alpine amd64 docker (official python3 alpine image): # python3 readelf.py $(which python3) | grep "/lib" b'/lib/ld-musl-x86_64.so.1\x00'
3. Essentially that's enough in my opinion, but we can go further and do what ldd does:
# /lib/ld-musl-x86_64.so.1 --list $(which python3) /lib/ld-musl-x86_64.so.1 (0x7fbc36567000) libpython3.7m.so.1.0 => /usr/local/lib/libpython3.7m.so.1.0
libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fbc36567000)
Basically it's just string matching here, and the only question now if the name of dynamic linker is enough or all libs should be iterated until perfect "musl" or "libc" match. Parsing "Dynamic section" turns out to be pretty useless – it's empty on Alpine (or parsing code is buggy). If ".interp" field is not available, then interpreter is statically linked :)
4. If any glibc-specific functionality is needed at this point, code from PEP 513 is really good. Maybe it's also better to put it first and use ELF parsing if it failed to open glibc in the first place.
Thanks, Alex
[1] https://snorfalorpagus.net/blog/2016/07/17/compiling-python-extensions-for-o... [2] https://www.linuxjournal.com/article/1060 [3] https://github.com/detailyang/readelf/blob/master/readelf/readelf.py#L545
On Wed, Feb 20, 2019 at 1:50 AM Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Feb 19, 2019 at 3:28 PM Alexander Revin <lyssdod@gmail.com>
wrote:
Hi all,
I have an idea regarding Python binary wheels on non-glibc platforms, and it seems that initially I've posted it to the wrong list ([1])
Long story short, the proposal is to use platform tuples (like compiler ones) for wheel names, which will allow much broader
(0x7fbc3622a000) platform
support, for example:
package-1.0-cp36-cp36m-amd64_linux_gnu.whl package-1.0-cp36-cp36m-amd64_linux_musl.whl
So eventually only {platform tag} part will be modified. Glibc/musl detection is quite trivial and eventually will be based on existing one in PEP 513 [2].
The challenge here is: the purpose of a target triple is to tell a compiler/linker toolchain which kind of code they should generate, e.g. when cross-compiling. The purpose of a wheel tag is to tell you whether a given wheel will work on a given system. It turns out these are different things :-).
For example, Ubuntu 18.10 and RHEL 6 are both 'amd64-linux-gnu', because they use the same instruction set, the same binary format (ELF), etc. But if you build a wheel on Ubuntu 18.10, it definitely will not work on RHEL 6. (The other way around might work, if you do other things right.)
In practice Windows and macOS are already fine; the place where this would be useful is Linux wheels for platforms that use non-Intel-based architectures or non-glibc-libcs. We do have an idea for making it easier to support newer glibcs and also extending to all architectures: https://mail.python.org/archives/list/distutils-sig@python.org/thread/6AFS4H...
Adding musl is a bit trickier since I'm not sure what the details of their ABI compatibility are, and they intentionally make it difficult to figure out whether you're running on musl. But if someone could convince them to publish more information then we could fix that too.
-n
-- Nathaniel J. Smith -- https://vorpus.org
I've asked on musl mailing list and it looks like possible: ---------- Forwarded message --------- From: Rich Felker <dalias@libc.org> Date: Tue, Feb 26, 2019 at 4:11 PM Subject: Re: [musl] ABI compatibility between versions To: Alexander Revin <lyssdod@gmail.com> Cc: <musl@lists.openwall.com> On Tue, Feb 26, 2019 at 12:28:31PM +0100, Alexander Revin wrote:
but for this reason a binary compiled against a new version of glibc is unlikely to work with an older version (which is why anybody who wants to distribute a binary that works across different linux distros, compiles against a very old version of glibc, which of course means lots of old bugs) while for musl such breakage is much more rare (happens when a new symbol is introduced and the binary uses that).
So it generally similar to glibc approach – link against old musl, which doesn't expose new symbols?
This works but isn't necessarily needed. As long as your application does not use any symbols that were introduced in a newer musl, it will run with an older one, subject to any bugs the older one might have. If configure is detecting and causing the program's build process to link to new symbols in the newer musl, and you don't want to depend on that, you can usually override the detections with configure variables on the configure command line or in an explicit config.cache file, or equivalent for other non-autoconf-based build systems. ---------- End of forwarded message --------- Alpine guys doesn't seem to use any specific build flags, though find_library function was customized: https://git.alpinelinux.org/aports/tree/main/python3 On Mon, Feb 25, 2019 at 10:48 PM Nathaniel Smith <njs@pobox.com> wrote:
Sniffing out the ELF loader is definitely more complicated than ideal – e.g. it adds a "find the python binary" step that could go wrong – but, ok, if that were the only barrier maybe we could manage.
The bigger problem is: how do we figure out whether a wheel built against *that* musl on *that* machine will work with *this* musl on *this* machine? For glibc, this involves three pieces, each of which is non-trivial:
- the glibc maintainers provide some careful, documented guarantees about when a library built against one glibc version will run with another glibc version, and they encode this in machine-readable form in their symbol versions
- auditwheel checks the symbol versions to derive a summary of what the wheel needs, and stores it the wheel metadata
- pip checks the local system's glibc version against this metadata
A simple "is this musl or not?" check isn't useful on its own. We also need some musl equivalent for this other machinery. (It doesn't have to work the same way, but it has to accomplish the same result.)
If you want to keep moving this forward you're going to have to talk to the musl maintainers.
-n
On Mon, Feb 25, 2019, 09:49 Alexander Revin <lyssdod@gmail.com> wrote:
I've put combined code here: https://gist.github.com/lyssdod/f51579ae8d93c8657a5564aefc2ffbca
Just download it, make executable and run.
amd64 Alpine: # ./guess_pyruntime.py Interpreter extracted: /lib/ld-musl-x86_64.so.1 Running on musl
amd64 Gentoo glibc: # ./guess_pyruntime.py Interpreter extracted: /lib64/ld-linux-x86-64.so.2 Running on glibc version 2.27
On Thu, Feb 21, 2019 at 3:57 AM Alexander Revin <lyssdod@gmail.com> wrote:
Hi Nathaniel,
Thanks for your answer.
Basing on your example of RHEL and Ubuntu, let's take RHEL 6 which uses glibc 2.12. If you cross-compile for it (using the same gcc RHEL uses), wheel surely will work on Ubuntu 18.10 :) I think it's not of an issue, since wheels are built with these minimal runtime requirements anyway, unless they're built on a local machine – but in this case they will just work™; anyway, having a working toolchain is not the scope of Python tooling. Gentoo has an awesome crossdev tool for creating cross-toolchains, and there's crosstool-ng of course. It looks like there are workarounds for putting code built on newer systems to older ones though, but they seem to be pretty tedious ([1])
Speaking of runtime detection, I see it this way for example – since one of the most reliable ways to check for program's dependencies is invoking something like ldd or objdump, it can essentially be done the same way:
1. Pick the minimal required code to extract ELF's ".interp" field [2] (I used code from [3]); 2. Process sys.executable with it;
Here what it returns (grepping by "/lib" because it starts from a new line):
Gentoo amd64 glibc # python3 readelf.py $(which python3) | grep "/lib" b'/lib64/ld-linux-x86-64.so.2\x00'
Alpine amd64 docker (official python3 alpine image): # python3 readelf.py $(which python3) | grep "/lib" b'/lib/ld-musl-x86_64.so.1\x00'
3. Essentially that's enough in my opinion, but we can go further and do what ldd does:
# /lib/ld-musl-x86_64.so.1 --list $(which python3) /lib/ld-musl-x86_64.so.1 (0x7fbc36567000) libpython3.7m.so.1.0 => /usr/local/lib/libpython3.7m.so.1.0 (0x7fbc3622a000) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fbc36567000)
Basically it's just string matching here, and the only question now if the name of dynamic linker is enough or all libs should be iterated until perfect "musl" or "libc" match. Parsing "Dynamic section" turns out to be pretty useless – it's empty on Alpine (or parsing code is buggy). If ".interp" field is not available, then interpreter is statically linked :)
4. If any glibc-specific functionality is needed at this point, code from PEP 513 is really good. Maybe it's also better to put it first and use ELF parsing if it failed to open glibc in the first place.
Thanks, Alex
[1] https://snorfalorpagus.net/blog/2016/07/17/compiling-python-extensions-for-o... [2] https://www.linuxjournal.com/article/1060 [3] https://github.com/detailyang/readelf/blob/master/readelf/readelf.py#L545
On Wed, Feb 20, 2019 at 1:50 AM Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Feb 19, 2019 at 3:28 PM Alexander Revin <lyssdod@gmail.com> wrote:
Hi all,
I have an idea regarding Python binary wheels on non-glibc platforms, and it seems that initially I've posted it to the wrong list ([1])
Long story short, the proposal is to use platform tuples (like compiler ones) for wheel names, which will allow much broader platform support, for example:
package-1.0-cp36-cp36m-amd64_linux_gnu.whl package-1.0-cp36-cp36m-amd64_linux_musl.whl
So eventually only {platform tag} part will be modified. Glibc/musl detection is quite trivial and eventually will be based on existing one in PEP 513 [2].
The challenge here is: the purpose of a target triple is to tell a compiler/linker toolchain which kind of code they should generate, e.g. when cross-compiling. The purpose of a wheel tag is to tell you whether a given wheel will work on a given system. It turns out these are different things :-).
For example, Ubuntu 18.10 and RHEL 6 are both 'amd64-linux-gnu', because they use the same instruction set, the same binary format (ELF), etc. But if you build a wheel on Ubuntu 18.10, it definitely will not work on RHEL 6. (The other way around might work, if you do other things right.)
In practice Windows and macOS are already fine; the place where this would be useful is Linux wheels for platforms that use non-Intel-based architectures or non-glibc-libcs. We do have an idea for making it easier to support newer glibcs and also extending to all architectures: https://mail.python.org/archives/list/distutils-sig@python.org/thread/6AFS4H...
Adding musl is a bit trickier since I'm not sure what the details of their ABI compatibility are, and they intentionally make it difficult to figure out whether you're running on musl. But if someone could convince them to publish more information then we could fix that too.
-n
-- Nathaniel J. Smith -- https://vorpus.org
OK, so what's your proposal for what auditwheel/pip/etc. should do to support musl? Do we need to put a list of which symbols each wheel uses in the filename, or ...? On Tue, Feb 26, 2019 at 8:16 AM Alexander Revin <lyssdod@gmail.com> wrote:
I've asked on musl mailing list and it looks like possible:
---------- Forwarded message --------- From: Rich Felker <dalias@libc.org> Date: Tue, Feb 26, 2019 at 4:11 PM Subject: Re: [musl] ABI compatibility between versions To: Alexander Revin <lyssdod@gmail.com> Cc: <musl@lists.openwall.com>
On Tue, Feb 26, 2019 at 12:28:31PM +0100, Alexander Revin wrote:
but for this reason a binary compiled against a new version of glibc is unlikely to work with an older version (which is why anybody who wants to distribute a binary that works across different linux distros, compiles against a very old version of glibc, which of course means lots of old bugs) while for musl such breakage is much more rare (happens when a new symbol is introduced and the binary uses that).
So it generally similar to glibc approach – link against old musl, which doesn't expose new symbols?
This works but isn't necessarily needed. As long as your application does not use any symbols that were introduced in a newer musl, it will run with an older one, subject to any bugs the older one might have. If configure is detecting and causing the program's build process to link to new symbols in the newer musl, and you don't want to depend on that, you can usually override the detections with configure variables on the configure command line or in an explicit config.cache file, or equivalent for other non-autoconf-based build systems.
---------- End of forwarded message ---------
Alpine guys doesn't seem to use any specific build flags, though find_library function was customized: https://git.alpinelinux.org/aports/tree/main/python3
On Mon, Feb 25, 2019 at 10:48 PM Nathaniel Smith <njs@pobox.com> wrote:
Sniffing out the ELF loader is definitely more complicated than ideal – e.g. it adds a "find the python binary" step that could go wrong – but, ok, if that were the only barrier maybe we could manage.
The bigger problem is: how do we figure out whether a wheel built against *that* musl on *that* machine will work with *this* musl on *this* machine? For glibc, this involves three pieces, each of which is non-trivial:
- the glibc maintainers provide some careful, documented guarantees about when a library built against one glibc version will run with another glibc version, and they encode this in machine-readable form in their symbol versions
- auditwheel checks the symbol versions to derive a summary of what the wheel needs, and stores it the wheel metadata
- pip checks the local system's glibc version against this metadata
A simple "is this musl or not?" check isn't useful on its own. We also need some musl equivalent for this other machinery. (It doesn't have to work the same way, but it has to accomplish the same result.)
If you want to keep moving this forward you're going to have to talk to the musl maintainers.
-n
On Mon, Feb 25, 2019, 09:49 Alexander Revin <lyssdod@gmail.com> wrote:
I've put combined code here: https://gist.github.com/lyssdod/f51579ae8d93c8657a5564aefc2ffbca
Just download it, make executable and run.
amd64 Alpine: # ./guess_pyruntime.py Interpreter extracted: /lib/ld-musl-x86_64.so.1 Running on musl
amd64 Gentoo glibc: # ./guess_pyruntime.py Interpreter extracted: /lib64/ld-linux-x86-64.so.2 Running on glibc version 2.27
On Thu, Feb 21, 2019 at 3:57 AM Alexander Revin <lyssdod@gmail.com> wrote:
Hi Nathaniel,
Thanks for your answer.
Basing on your example of RHEL and Ubuntu, let's take RHEL 6 which uses glibc 2.12. If you cross-compile for it (using the same gcc RHEL uses), wheel surely will work on Ubuntu 18.10 :) I think it's not of an issue, since wheels are built with these minimal runtime requirements anyway, unless they're built on a local machine – but in this case they will just work™; anyway, having a working toolchain is not the scope of Python tooling. Gentoo has an awesome crossdev tool for creating cross-toolchains, and there's crosstool-ng of course. It looks like there are workarounds for putting code built on newer systems to older ones though, but they seem to be pretty tedious ([1])
Speaking of runtime detection, I see it this way for example – since one of the most reliable ways to check for program's dependencies is invoking something like ldd or objdump, it can essentially be done the same way:
1. Pick the minimal required code to extract ELF's ".interp" field [2] (I used code from [3]); 2. Process sys.executable with it;
Here what it returns (grepping by "/lib" because it starts from a new line):
Gentoo amd64 glibc # python3 readelf.py $(which python3) | grep "/lib" b'/lib64/ld-linux-x86-64.so.2\x00'
Alpine amd64 docker (official python3 alpine image): # python3 readelf.py $(which python3) | grep "/lib" b'/lib/ld-musl-x86_64.so.1\x00'
3. Essentially that's enough in my opinion, but we can go further and do what ldd does:
# /lib/ld-musl-x86_64.so.1 --list $(which python3) /lib/ld-musl-x86_64.so.1 (0x7fbc36567000) libpython3.7m.so.1.0 => /usr/local/lib/libpython3.7m.so.1.0 (0x7fbc3622a000) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fbc36567000)
Basically it's just string matching here, and the only question now if the name of dynamic linker is enough or all libs should be iterated until perfect "musl" or "libc" match. Parsing "Dynamic section" turns out to be pretty useless – it's empty on Alpine (or parsing code is buggy). If ".interp" field is not available, then interpreter is statically linked :)
4. If any glibc-specific functionality is needed at this point, code from PEP 513 is really good. Maybe it's also better to put it first and use ELF parsing if it failed to open glibc in the first place.
Thanks, Alex
[1] https://snorfalorpagus.net/blog/2016/07/17/compiling-python-extensions-for-o... [2] https://www.linuxjournal.com/article/1060 [3] https://github.com/detailyang/readelf/blob/master/readelf/readelf.py#L545
On Wed, Feb 20, 2019 at 1:50 AM Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Feb 19, 2019 at 3:28 PM Alexander Revin <lyssdod@gmail.com> wrote:
Hi all,
I have an idea regarding Python binary wheels on non-glibc platforms, and it seems that initially I've posted it to the wrong list ([1])
Long story short, the proposal is to use platform tuples (like compiler ones) for wheel names, which will allow much broader platform support, for example:
package-1.0-cp36-cp36m-amd64_linux_gnu.whl package-1.0-cp36-cp36m-amd64_linux_musl.whl
So eventually only {platform tag} part will be modified. Glibc/musl detection is quite trivial and eventually will be based on existing one in PEP 513 [2].
The challenge here is: the purpose of a target triple is to tell a compiler/linker toolchain which kind of code they should generate, e.g. when cross-compiling. The purpose of a wheel tag is to tell you whether a given wheel will work on a given system. It turns out these are different things :-).
For example, Ubuntu 18.10 and RHEL 6 are both 'amd64-linux-gnu', because they use the same instruction set, the same binary format (ELF), etc. But if you build a wheel on Ubuntu 18.10, it definitely will not work on RHEL 6. (The other way around might work, if you do other things right.)
In practice Windows and macOS are already fine; the place where this would be useful is Linux wheels for platforms that use non-Intel-based architectures or non-glibc-libcs. We do have an idea for making it easier to support newer glibcs and also extending to all architectures: https://mail.python.org/archives/list/distutils-sig@python.org/thread/6AFS4H...
Adding musl is a bit trickier since I'm not sure what the details of their ABI compatibility are, and they intentionally make it difficult to figure out whether you're running on musl. But if someone could convince them to publish more information then we could fix that too.
-n
-- Nathaniel J. Smith -- https://vorpus.org
-- Nathaniel J. Smith -- https://vorpus.org
My proposal is the following: 1) Put runtime detection function in pip, so it should decide which platform tag to fetch from pypi; 2) Regarding auditwheel – I think it's a good point to try to mainline Alpine's patches, so musl-compiled Python can check for libraries. Chances are it will fix auditwheel right away without any further modifications. 3) Document it in PEP? I don't really see a point in putting symbols somewhere in filename, because auditwheel does exactly that – it bundles the deps, so symbols can be resolved. My initial proposal for using "musl" as part of "compiler triple" for {platform_tag} and changing the tag format stays the same. What do you think? What will be the right option to test it? I can try to create a custom pypi mirror which allows new platform tags to be uploaded and fork pip + auditwheel to make aforementioned changes, but maybe there are better ways... Thanks, Alex On Wed, Feb 27, 2019 at 4:56 AM Nathaniel Smith <njs@pobox.com> wrote:
OK, so what's your proposal for what auditwheel/pip/etc. should do to support musl? Do we need to put a list of which symbols each wheel uses in the filename, or ...?
On Tue, Feb 26, 2019 at 8:16 AM Alexander Revin <lyssdod@gmail.com> wrote:
I've asked on musl mailing list and it looks like possible:
---------- Forwarded message --------- From: Rich Felker <dalias@libc.org> Date: Tue, Feb 26, 2019 at 4:11 PM Subject: Re: [musl] ABI compatibility between versions To: Alexander Revin <lyssdod@gmail.com> Cc: <musl@lists.openwall.com>
On Tue, Feb 26, 2019 at 12:28:31PM +0100, Alexander Revin wrote:
but for this reason a binary compiled against a new version of glibc is unlikely to work with an older version (which is why anybody who wants to distribute a binary that works across different linux distros, compiles against a very old version of glibc, which of course means lots of old bugs) while for musl such breakage is much more rare (happens when a new symbol is introduced and the binary uses that).
So it generally similar to glibc approach – link against old musl, which doesn't expose new symbols?
This works but isn't necessarily needed. As long as your application does not use any symbols that were introduced in a newer musl, it will run with an older one, subject to any bugs the older one might have. If configure is detecting and causing the program's build process to link to new symbols in the newer musl, and you don't want to depend on that, you can usually override the detections with configure variables on the configure command line or in an explicit config.cache file, or equivalent for other non-autoconf-based build systems.
---------- End of forwarded message ---------
Alpine guys doesn't seem to use any specific build flags, though find_library function was customized: https://git.alpinelinux.org/aports/tree/main/python3
On Mon, Feb 25, 2019 at 10:48 PM Nathaniel Smith <njs@pobox.com> wrote:
Sniffing out the ELF loader is definitely more complicated than ideal – e.g. it adds a "find the python binary" step that could go wrong – but, ok, if that were the only barrier maybe we could manage.
The bigger problem is: how do we figure out whether a wheel built against *that* musl on *that* machine will work with *this* musl on *this* machine? For glibc, this involves three pieces, each of which is non-trivial:
- the glibc maintainers provide some careful, documented guarantees about when a library built against one glibc version will run with another glibc version, and they encode this in machine-readable form in their symbol versions
- auditwheel checks the symbol versions to derive a summary of what the wheel needs, and stores it the wheel metadata
- pip checks the local system's glibc version against this metadata
A simple "is this musl or not?" check isn't useful on its own. We also need some musl equivalent for this other machinery. (It doesn't have to work the same way, but it has to accomplish the same result.)
If you want to keep moving this forward you're going to have to talk to the musl maintainers.
-n
On Mon, Feb 25, 2019, 09:49 Alexander Revin <lyssdod@gmail.com> wrote:
I've put combined code here: https://gist.github.com/lyssdod/f51579ae8d93c8657a5564aefc2ffbca
Just download it, make executable and run.
amd64 Alpine: # ./guess_pyruntime.py Interpreter extracted: /lib/ld-musl-x86_64.so.1 Running on musl
amd64 Gentoo glibc: # ./guess_pyruntime.py Interpreter extracted: /lib64/ld-linux-x86-64.so.2 Running on glibc version 2.27
On Thu, Feb 21, 2019 at 3:57 AM Alexander Revin <lyssdod@gmail.com> wrote:
Hi Nathaniel,
Thanks for your answer.
Basing on your example of RHEL and Ubuntu, let's take RHEL 6 which uses glibc 2.12. If you cross-compile for it (using the same gcc RHEL uses), wheel surely will work on Ubuntu 18.10 :) I think it's not of an issue, since wheels are built with these minimal runtime requirements anyway, unless they're built on a local machine – but in this case they will just work™; anyway, having a working toolchain is not the scope of Python tooling. Gentoo has an awesome crossdev tool for creating cross-toolchains, and there's crosstool-ng of course. It looks like there are workarounds for putting code built on newer systems to older ones though, but they seem to be pretty tedious ([1])
Speaking of runtime detection, I see it this way for example – since one of the most reliable ways to check for program's dependencies is invoking something like ldd or objdump, it can essentially be done the same way:
1. Pick the minimal required code to extract ELF's ".interp" field [2] (I used code from [3]); 2. Process sys.executable with it;
Here what it returns (grepping by "/lib" because it starts from a new line):
Gentoo amd64 glibc # python3 readelf.py $(which python3) | grep "/lib" b'/lib64/ld-linux-x86-64.so.2\x00'
Alpine amd64 docker (official python3 alpine image): # python3 readelf.py $(which python3) | grep "/lib" b'/lib/ld-musl-x86_64.so.1\x00'
3. Essentially that's enough in my opinion, but we can go further and do what ldd does:
# /lib/ld-musl-x86_64.so.1 --list $(which python3) /lib/ld-musl-x86_64.so.1 (0x7fbc36567000) libpython3.7m.so.1.0 => /usr/local/lib/libpython3.7m.so.1.0 (0x7fbc3622a000) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fbc36567000)
Basically it's just string matching here, and the only question now if the name of dynamic linker is enough or all libs should be iterated until perfect "musl" or "libc" match. Parsing "Dynamic section" turns out to be pretty useless – it's empty on Alpine (or parsing code is buggy). If ".interp" field is not available, then interpreter is statically linked :)
4. If any glibc-specific functionality is needed at this point, code from PEP 513 is really good. Maybe it's also better to put it first and use ELF parsing if it failed to open glibc in the first place.
Thanks, Alex
[1] https://snorfalorpagus.net/blog/2016/07/17/compiling-python-extensions-for-o... [2] https://www.linuxjournal.com/article/1060 [3] https://github.com/detailyang/readelf/blob/master/readelf/readelf.py#L545
On Wed, Feb 20, 2019 at 1:50 AM Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Feb 19, 2019 at 3:28 PM Alexander Revin <lyssdod@gmail.com> wrote: > > Hi all, > > I have an idea regarding Python binary wheels on non-glibc platforms, > and it seems that initially I've posted it to the wrong list ([1]) > > Long story short, the proposal is to use platform tuples (like > compiler ones) for wheel names, which will allow much broader platform > support, for example: > > package-1.0-cp36-cp36m-amd64_linux_gnu.whl > package-1.0-cp36-cp36m-amd64_linux_musl.whl > > So eventually only {platform tag} part will be modified. Glibc/musl > detection is quite trivial and eventually will be based on existing > one in PEP 513 [2].
The challenge here is: the purpose of a target triple is to tell a compiler/linker toolchain which kind of code they should generate, e.g. when cross-compiling. The purpose of a wheel tag is to tell you whether a given wheel will work on a given system. It turns out these are different things :-).
For example, Ubuntu 18.10 and RHEL 6 are both 'amd64-linux-gnu', because they use the same instruction set, the same binary format (ELF), etc. But if you build a wheel on Ubuntu 18.10, it definitely will not work on RHEL 6. (The other way around might work, if you do other things right.)
In practice Windows and macOS are already fine; the place where this would be useful is Linux wheels for platforms that use non-Intel-based architectures or non-glibc-libcs. We do have an idea for making it easier to support newer glibcs and also extending to all architectures: https://mail.python.org/archives/list/distutils-sig@python.org/thread/6AFS4H...
Adding musl is a bit trickier since I'm not sure what the details of their ABI compatibility are, and they intentionally make it difficult to figure out whether you're running on musl. But if someone could convince them to publish more information then we could fix that too.
-n
-- Nathaniel J. Smith -- https://vorpus.org
-- Nathaniel J. Smith -- https://vorpus.org
Related question – isn't auditwheel being runned against all linux wheels before uploading fixed manylinux* wheels to pypi anyway? If yes, probably there's a need to use another container for musl systems, though this has to be investigated. On Wed, Feb 27, 2019 at 7:20 PM Alexander Revin <lyssdod@gmail.com> wrote:
My proposal is the following:
1) Put runtime detection function in pip, so it should decide which platform tag to fetch from pypi; 2) Regarding auditwheel – I think it's a good point to try to mainline Alpine's patches, so musl-compiled Python can check for libraries. Chances are it will fix auditwheel right away without any further modifications. 3) Document it in PEP?
I don't really see a point in putting symbols somewhere in filename, because auditwheel does exactly that – it bundles the deps, so symbols can be resolved. My initial proposal for using "musl" as part of "compiler triple" for {platform_tag} and changing the tag format stays the same.
What do you think? What will be the right option to test it? I can try to create a custom pypi mirror which allows new platform tags to be uploaded and fork pip + auditwheel to make aforementioned changes, but maybe there are better ways...
Thanks, Alex
On Wed, Feb 27, 2019 at 4:56 AM Nathaniel Smith <njs@pobox.com> wrote:
OK, so what's your proposal for what auditwheel/pip/etc. should do to support musl? Do we need to put a list of which symbols each wheel uses in the filename, or ...?
On Tue, Feb 26, 2019 at 8:16 AM Alexander Revin <lyssdod@gmail.com> wrote:
I've asked on musl mailing list and it looks like possible:
---------- Forwarded message --------- From: Rich Felker <dalias@libc.org> Date: Tue, Feb 26, 2019 at 4:11 PM Subject: Re: [musl] ABI compatibility between versions To: Alexander Revin <lyssdod@gmail.com> Cc: <musl@lists.openwall.com>
On Tue, Feb 26, 2019 at 12:28:31PM +0100, Alexander Revin wrote:
but for this reason a binary compiled against a new version of glibc is unlikely to work with an older version (which is why anybody who wants to distribute a binary that works across different linux distros, compiles against a very old version of glibc, which of course means lots of old bugs) while for musl such breakage is much more rare (happens when a new symbol is introduced and the binary uses that).
So it generally similar to glibc approach – link against old musl, which doesn't expose new symbols?
This works but isn't necessarily needed. As long as your application does not use any symbols that were introduced in a newer musl, it will run with an older one, subject to any bugs the older one might have. If configure is detecting and causing the program's build process to link to new symbols in the newer musl, and you don't want to depend on that, you can usually override the detections with configure variables on the configure command line or in an explicit config.cache file, or equivalent for other non-autoconf-based build systems.
---------- End of forwarded message ---------
Alpine guys doesn't seem to use any specific build flags, though find_library function was customized: https://git.alpinelinux.org/aports/tree/main/python3
On Mon, Feb 25, 2019 at 10:48 PM Nathaniel Smith <njs@pobox.com> wrote:
Sniffing out the ELF loader is definitely more complicated than ideal – e.g. it adds a "find the python binary" step that could go wrong – but, ok, if that were the only barrier maybe we could manage.
The bigger problem is: how do we figure out whether a wheel built against *that* musl on *that* machine will work with *this* musl on *this* machine? For glibc, this involves three pieces, each of which is non-trivial:
- the glibc maintainers provide some careful, documented guarantees about when a library built against one glibc version will run with another glibc version, and they encode this in machine-readable form in their symbol versions
- auditwheel checks the symbol versions to derive a summary of what the wheel needs, and stores it the wheel metadata
- pip checks the local system's glibc version against this metadata
A simple "is this musl or not?" check isn't useful on its own. We also need some musl equivalent for this other machinery. (It doesn't have to work the same way, but it has to accomplish the same result.)
If you want to keep moving this forward you're going to have to talk to the musl maintainers.
-n
On Mon, Feb 25, 2019, 09:49 Alexander Revin <lyssdod@gmail.com> wrote:
I've put combined code here: https://gist.github.com/lyssdod/f51579ae8d93c8657a5564aefc2ffbca
Just download it, make executable and run.
amd64 Alpine: # ./guess_pyruntime.py Interpreter extracted: /lib/ld-musl-x86_64.so.1 Running on musl
amd64 Gentoo glibc: # ./guess_pyruntime.py Interpreter extracted: /lib64/ld-linux-x86-64.so.2 Running on glibc version 2.27
On Thu, Feb 21, 2019 at 3:57 AM Alexander Revin <lyssdod@gmail.com> wrote:
Hi Nathaniel,
Thanks for your answer.
Basing on your example of RHEL and Ubuntu, let's take RHEL 6 which uses glibc 2.12. If you cross-compile for it (using the same gcc RHEL uses), wheel surely will work on Ubuntu 18.10 :) I think it's not of an issue, since wheels are built with these minimal runtime requirements anyway, unless they're built on a local machine – but in this case they will just work™; anyway, having a working toolchain is not the scope of Python tooling. Gentoo has an awesome crossdev tool for creating cross-toolchains, and there's crosstool-ng of course. It looks like there are workarounds for putting code built on newer systems to older ones though, but they seem to be pretty tedious ([1])
Speaking of runtime detection, I see it this way for example – since one of the most reliable ways to check for program's dependencies is invoking something like ldd or objdump, it can essentially be done the same way:
1. Pick the minimal required code to extract ELF's ".interp" field [2] (I used code from [3]); 2. Process sys.executable with it;
Here what it returns (grepping by "/lib" because it starts from a new line):
Gentoo amd64 glibc # python3 readelf.py $(which python3) | grep "/lib" b'/lib64/ld-linux-x86-64.so.2\x00'
Alpine amd64 docker (official python3 alpine image): # python3 readelf.py $(which python3) | grep "/lib" b'/lib/ld-musl-x86_64.so.1\x00'
3. Essentially that's enough in my opinion, but we can go further and do what ldd does:
# /lib/ld-musl-x86_64.so.1 --list $(which python3) /lib/ld-musl-x86_64.so.1 (0x7fbc36567000) libpython3.7m.so.1.0 => /usr/local/lib/libpython3.7m.so.1.0 (0x7fbc3622a000) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fbc36567000)
Basically it's just string matching here, and the only question now if the name of dynamic linker is enough or all libs should be iterated until perfect "musl" or "libc" match. Parsing "Dynamic section" turns out to be pretty useless – it's empty on Alpine (or parsing code is buggy). If ".interp" field is not available, then interpreter is statically linked :)
4. If any glibc-specific functionality is needed at this point, code from PEP 513 is really good. Maybe it's also better to put it first and use ELF parsing if it failed to open glibc in the first place.
Thanks, Alex
[1] https://snorfalorpagus.net/blog/2016/07/17/compiling-python-extensions-for-o... [2] https://www.linuxjournal.com/article/1060 [3] https://github.com/detailyang/readelf/blob/master/readelf/readelf.py#L545
On Wed, Feb 20, 2019 at 1:50 AM Nathaniel Smith <njs@pobox.com> wrote: > > On Tue, Feb 19, 2019 at 3:28 PM Alexander Revin <lyssdod@gmail.com> wrote: > > > > Hi all, > > > > I have an idea regarding Python binary wheels on non-glibc platforms, > > and it seems that initially I've posted it to the wrong list ([1]) > > > > Long story short, the proposal is to use platform tuples (like > > compiler ones) for wheel names, which will allow much broader platform > > support, for example: > > > > package-1.0-cp36-cp36m-amd64_linux_gnu.whl > > package-1.0-cp36-cp36m-amd64_linux_musl.whl > > > > So eventually only {platform tag} part will be modified. Glibc/musl > > detection is quite trivial and eventually will be based on existing > > one in PEP 513 [2]. > > The challenge here is: the purpose of a target triple is to tell a > compiler/linker toolchain which kind of code they should generate, > e.g. when cross-compiling. The purpose of a wheel tag is to tell you > whether a given wheel will work on a given system. It turns out these > are different things :-). > > For example, Ubuntu 18.10 and RHEL 6 are both 'amd64-linux-gnu', > because they use the same instruction set, the same binary format > (ELF), etc. But if you build a wheel on Ubuntu 18.10, it definitely > will not work on RHEL 6. (The other way around might work, if you do > other things right.) > > In practice Windows and macOS are already fine; the place where this > would be useful is Linux wheels for platforms that use non-Intel-based > architectures or non-glibc-libcs. We do have an idea for making it > easier to support newer glibcs and also extending to all > architectures: https://mail.python.org/archives/list/distutils-sig@python.org/thread/6AFS4H... > > Adding musl is a bit trickier since I'm not sure what the details of > their ABI compatibility are, and they intentionally make it difficult > to figure out whether you're running on musl. But if someone could > convince them to publish more information then we could fix that too. > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org
-- Nathaniel J. Smith -- https://vorpus.org
participants (2)
-
Alexander Revin
-
Nathaniel Smith