Fwd: Re: [Python-ideas] Add processor generation to wheel metadata
Forwarding here as suggested. -------- Forwarded Message -------- Subject: Re: [Python-ideas] Add processor generation to wheel metadata Date: Mon, 30 Oct 2017 23:02:08 -0700 From: Nathaniel Smith <njs@pobox.com> To: Ivan Pozdeev <vano@mail.mipt.ru> CC: python-ideas@python.org <python-ideas@python.org> On Mon, Oct 30, 2017 at 5:45 AM, Ivan Pozdeev via Python-ideas <python-ideas@python.org> wrote:
Generally, packages are compiled for the same processor generation as the corresponding Python. But not always -- e.g. NumPy opted for SSE2 even for Py2 to work around some compiler bug (https://github.com/numpy/numpy/issues/6428). I was bitten by that at an old machine once and found out that there is no way for `pip' to have checked for that. Besides, performance-oriented packages like the one mentioned could probably benefit from newer instructions.
You should probably resend this to distutils-sig instead of python-ideas -- that's where discussions about python packaging happen. (Python-ideas is more for discussions about the language itself.) -n -- Nathaniel J. Smith -- https://vorpus.org
Missed the fact that Nathaniel didn't quote the entire original message. Here it is: ------------------------------------------------------------------------ Generally, packages are compiled for the same processor generation as the corresponding Python. But not always -- e.g. NumPy opted for SSE2 even for Py2 to work around some compiler bug (https://github.com/numpy/numpy/issues/6428). I was bitten by that at an old machine once and found out that there is no way for `pip' to have checked for that. Besides, performance-oriented packages like the one mentioned could probably benefit from newer instructions. Regarding identifiers: gcc, cl and clang all have their private directories of generation identifiers: https://gcc.gnu.org/onlinedocs/gcc-4.7.1/gcc/i386-and-x86_002d64-Options.htm... https://msdn.microsoft.com/en-us/library/7t5yh4fd.aspx https://clang.llvm.org/doxygen/Driver_2ToolChains_2Arch_2X86_8cpp_source.htm... Linux packages typically use gcc's ones. Clang generally follows in gcc's footsteps and accepts cl's IDs, too, as aliases. So, using the IDs of whatever compiler is used to build the package (i.e. most likely the canonical compiler for CPython for that platform) looks like the simple&stupid(r) way - we can just take the value of the "march" argument. The tricky part is mapping the system's processor to an ID when checking compatibility: the logic will have to keep a directory (that's the job of `wheel' package maintainers though, I guess). I also guess that there are cases where there's no such thing as _the_ system's processor. -- Regards, Ivan
The Anaconda 5.0 release used very modern compilers to rebuild almost everything (~99.5%) provided in the installers for x86 Linux and MacOS. This enables Anaconda users to get the benefits of the latest compilers— still allowing support for older operating systems—back to MacOS 10.9 and CentOS 6. Our own builds of GCC 7.2 (Linux) and Clang 4.0.1 (MacOS) are used, and every reasonable security flag has been enabled. CFLAGS and CXXFLAGS are no longer managed by each package; instead compiler activation sets them globally.
The packages built with the new compilers are in a different channel from
Maybe the anaconda team has some insight on a standard way to capture (& configure) compiler versions and flags in package metadata? From https://www.anaconda.com/blog/developer-blog/announcing-the-release-of-anaco... : packages built the old way, and as we build out this new channel, we will eventually be able to change the default experience to only using these packages. Interested in using this approach to build your own conda packages? Stay tuned for a more developer-focused blog post! On Tuesday, October 31, 2017, Ivan Pozdeev via Distutils-SIG < distutils-sig@python.org> wrote:
Missed the fact that Nathaniel didn't quote the entire original message. Here it is: ------------------------------
Generally, packages are compiled for the same processor generation as the corresponding Python. But not always -- e.g. NumPy opted for SSE2 even for Py2 to work around some compiler bug (https://github.com/numpy/numpy/issues/6428). I was bitten by that at an old machine once and found out that there is no way for `pip' to have checked for that. Besides, performance-oriented packages like the one mentioned could probably benefit from newer instructions.
Regarding identifiers: gcc, cl and clang all have their private directories of generation identifiers: https://gcc.gnu.org/onlinedocs/gcc-4.7.1/gcc/i386- and-x86_002d64-Options.html https://msdn.microsoft.com/en-us/library/7t5yh4fd.aspx https://clang.llvm.org/doxygen/Driver_2ToolChains_ 2Arch_2X86_8cpp_source.html
Linux packages typically use gcc's ones. Clang generally follows in gcc's footsteps and accepts cl's IDs, too, as aliases.
So, using the IDs of whatever compiler is used to build the package (i.e. most likely the canonical compiler for CPython for that platform) looks like the simple&stupid(r) way - we can just take the value of the "march" argument.
The tricky part is mapping the system's processor to an ID when checking compatibility: the logic will have to keep a directory (that's the job of `wheel' package maintainers though, I guess). I also guess that there are cases where there's no such thing as _the_ system's processor.
-- Regards, Ivan
On 1 November 2017 at 02:23, Wes Turner <wes.turner@gmail.com> wrote:
Maybe the anaconda team has some insight on a standard way to capture (& configure) compiler versions and flags in package metadata?
The short answer is "You don't, and instead live with the related uncertainty". Similar to Linux distro packages, these expectations are set by the build environment, and it's up to whoever's publishing a conda channel to make sure all the packages it contains are consistent both with each other, and with 3rd party wheel files (if they want to support layered applications which mix third party wheel files with platform provided binaries). https://reproducible-builds.org/docs/perimeter/ discusses some of the aspects of a system that may or may not affect a build process, and hence the ABI compatibility of a result. There unfortunately isn't a generic way of knowing which CPU flags are actually going to be important in determining a project's ABI or its platform requirements (e.g. CPython's independence of SSE2 doesn't arise from specific choice - it arises from the fact that there isn't any code in CPython that relies on CPU provided vector operations). In the PEP 426/459 draft, my proposal was to have an optional extension to the metadata called "python.constraints" that allowed projects to declare particular compatibility constraints on their installation environments: https://www.python.org/dev/peps/pep-0459/#the-python-constraints-extension That way, an installer could download a wheel file, check if its usage constraints were met, and if not, fall back to building from source. That still seems like a reasonable way to go to me, although we may want to look at defining it as a separate JSON file stored in dist-info (similar to entry_points.txt), rather than tying it to a new version of the main metadata spec. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 01.11.2017 7:33, Nick Coghlan wrote:
https://reproducible-builds.org/docs/perimeter/ discusses some of the aspects of a system that may or may not affect a build process, and hence the ABI compatibility of a result. There unfortunately isn't a generic way of knowing which CPU flags are actually going to be important in determining a project's ABI or its platform requirements There's no such thing as "a set of compiler flags that unequivocally defines ABI." Any proprocessor macro that changes a type or a subroutine's signature will result in an incompatibility, too. So, "ABI" can only be defined on a case by case basis. (e.g. CPython's independence of SSE2 doesn't arise from specific choice - it arises from the fact that there isn't any code in CPython that relies on CPU provided vector operations). CPython is compiled with the corresponding /arch and doesn't use SSE2 intrinsics (I did check that cl.exe doesn't trigger an error on the latter even with lower /arch). That's enough to guarantee the instruction set that it uses. So, I don't see a problem here. That way, an installer could download a wheel file, check if its usage constraints were met, and if not, fall back to building from source. That still seems like a reasonable way to go to me, although we may want to look at defining it as a separate JSON file stored in dist-info (similar to entry_points.txt), rather than tying it to a new version of the main metadata spec. Having to download a wheel just to check compatibility is a deal breaker. There are dozens of them for a given package (pip examines old versions, too), and each one can weigh quite a bit. With more flavors to add, the number will multiply. Metadata required to check if the wheel is suitable for install must be available separately from its contents.
Currently, pip examines the triplets in package names. What I'm suggesting is to put/append something more specific than "win32" into it when applicable.
In the PEP 426/459 draft, my proposal was to have an optional extension to the metadata called "python.constraints" that allowed projects to declare particular compatibility constraints on their installation environments: https://www.python.org/dev/peps/pep-0459/#the-python-constraints-extension I checked it out. It's not applicable here for the aforementioned reason, and one more thing.
A package's file name should be unique - to easily distinguish it from others. The _one obviouls way_(tm) is to make it contain all the metadata fields that distinguish this package from others. I mean, you _can_ make metadata available otherwise (with some API, in a small companion file etc.) -- but that'd be suboptimal to simply a meaningful file name that will go with this file wherever it goes. (For an illustration why this is an anti-pattern, you can check out a WSUS server contents -- that's an epitome of meaningless filenames, you can't know anything about a file without a database query. Not only is that grossly inconvenient; if the database goes under, the entire multigigabyte storage becomes useless.) -- Regards, Ivan;
On 31.10.2017 19:23, Wes Turner wrote:
Maybe the anaconda team has some insight on a standard way to capture (& configure) compiler versions and flags in package metadata?
From https://www.anaconda.com/blog/developer-blog/announcing-the-release-of-anaco... :
The Anaconda 5.0 release used very modern compilers to rebuild almost everything (~99.5%) provided in the installers for x86 Linux and MacOS. This enables Anaconda users to get the benefits of the latest compilers— still allowing support for older operating systems—back to MacOS 10.9 and CentOS 6. Our own builds of GCC 7.2 (Linux) and Clang 4.0.1 (MacOS) are used, and every reasonable security flag has been enabled. CFLAGS and CXXFLAGS are no longer managed by each package; instead compiler activation sets them globally.
The packages built with the new compilers are in a different channel from packages built the old way, and as we build out this new channel, we will eventually be able to change the default experience to only using these packages. Interested in using this approach to build your own conda packages? Stay tuned for a more developer-focused blog post! It's all the same -- all packages are supposed to be built with the same compiler settings. So there's no problem of saving the flags into package metadata and checking compatibility with the local system.
There's, in fact, the same problem of potential unchecked incompatibility if a package is compiled in a different environment: see https://stackoverflow.com/questions/46912969/osx-c14-compiler-not-detected-m... (in that case, it's a different compiler version). -- Regards, Ivan
participants (3)
-
Ivan Pozdeev
-
Nick Coghlan
-
Wes Turner