lasizoillo wrote:
The manylinux project does not specify a way to record "environment set up" information required to build wheels from certain packages. Some packages require that the stock manylinux environment be modified by installing system dependencies, setting environment variables, modifying the filesystem etc prior to building, or the building and bundling will fail. Currently, package authors capture this information in project specific documentation, or not at all. In the event that a wheel is not provided by the author, consumers that want a compiled distribution (users focused on reproducible builds, build efficiency) are required to locate this information in project specific documentation or do research on the internals of the package to gather this information themselves. Currently, package consumers can not easily modify or contribute to this data as it is not standardized. What are everyone's thoughts about adding a section to pyproject.toml to capture this information? This allows package consumers to easily find and contribute back to this data, and programs like installers, builders etc can read this data as well. Is this PEP-able or where could this best be contributed? I am interested in (and intend on) contributing to solving this problem if it is accepted as a good idea! Thank you for your input! Something like this: [manylinux1] system_dependencies:
foo-1.0.0 environment_variables: FOO=BAR python_packages: foo==1.0.0
Its interesting for me, but I see some problems. System_dependencies: many linux have different names (example: apache2 vs
httpd). Sometimes same dependency is sliced in multiple packages (example: python3.9 depends on python3.9-minimal and...). How many times have you see a machine with python but without idle or tkinter? And naming convention is a pain for build systems dependencies and runtime system dependencies. Fortunately we have containers and can run different linux flavours. And old version of theese flavour which is very important. Environment variables: Usually used to inject configuration. My GOPATH env var is under my home. Maybe a env name can have a description, default value,... but not a fixed value, because is not easy to share with others. To avoid system dependencies, some build tools like https://github.com/joerick/cibuildwheel are centered on CI suites. With a CI you can use windows, linux, osx,... without think in mount virtual machines (libvirt, virtualbox,...) or coordinate buildbots farms more or less well configured. Once a wheel is compiled, build dependencies are no needed more. Then, cibuildwheel uses other libraries like auditwheel to embed shared libraries into wheels. So runtime system dependencies are included in wheel too. Mostly (see limitations https://github.com/pypa/auditwheel#limitations). So requirements shoud be somelike...
base_system_dependency: links to a OCI container, for example (open specifications if need to choose a technology). This mark minimal version of clib compatibility. extra_base_system_repositories: We want modern libraries, with all patches, but running on old linux version to improve compatibility with clib. build_dependencies: they must exists on base_system_dependencies or extra_base_system_repositories. build_system_dependencies: things used to test, compile and verify build. For example patchelf used by auditwheel build_python_dependencies: libraries used to build things. Like auditwheel used to pack into wheels runtime dependencies. build_steps: Maybe even environment variables are a good idea if it will run in containers. It could include unit-tests, compilation, integration tests after compilation,... artifacts: wheels, rpm packages, deb packages,... runtime_dependencies: All things that can not be included in our generated artifacts. python_runtime_dependencies: The simple part system_runtime_dependencies: But for this section, system dependencies will not run containerized. Return to diffent names, available versions, incompatibilities,...
I think that it's possible generate metadatas describing build requirements. Even if it fails for automated construction are a guide "TL;DR" of how install it. But if it can serve to generate automated constructions, repair metadatas will be easy and then automated construction will run. Then we'll need a pyproject2docker or pyproject2gitlabci or... Maybe this reference implementations should be needed to test specification. Metadatas should be used to avoid technological dependence in builds. If it no ends in a PEP is a good project to fill a private-repo with wheels. Regards, Javi
Excellent thoughts, thanks for your input! The additional configuration fields you suggested capture valuable information and speak to a comprehensive build system that allows you to build in any image you want, with lots of customization. This is supported by PEP 600 -- Future 'manylinux' Platform Tags for Portable Linux Built Distributions (https://www.python.org/dev/peps/pep-0600/)...
Any method of producing wheels which meets these criteria is acceptable. However, PEP 600 also says... ...in practice we expect that the auditwheel project will maintain an up-to-date set of tools and build images for producing manylinux wheels, as well as documentation about how they work and how to use them, and that most maintainers will want to use those.
This suggests that targeting the manylinux images will create the most value as it is assumed those images are what the majority of manylinux wheels will be built on. The work I am suggesting is to standardize "system set up" in manylinux containers prior to building. This would make the package author responsible for calling `wheel` to build, `auditwheel` to bundle and repair, and for uploading to their index of choice, so configuration for these actions is explicitly not captured in the proposed spec. In response to your suggestions for fields in the configuration: Fields that are appropriate to add: 1. `extra_system_repositories` 2. `steps` Fields that are not necessary: 1. `base_system`: This is not necessary when targeting only manylinux images as the section header can denote which image to run in and any "builder" can map between values, allowing custom urls doesn't create value here as there are only a few official manylinux images 2. `system_dependencies` and `build_system_dependencies`: why separate these into two groups? 3. `system_runtime_dependencies` I think this is a great idea but out of scope. This work should focus on build dependencies to avoid scope creep 4. `python_runtime_dependencies`: I believe this information is already captured in PEP-631: Dependency specification in pyproject.toml based on PEP 508 (https://www.python.org/dev/peps/pep-0631/) Updated working spec: ```toml [manylinux_build_spec.manylinux2014] extra_base_system_repositories: - FOO system_dependencies: - bar-1.0.0 environment_variables: - FOO=BAR python_dependencies: - foo==1.0.0 steps: - ./scripts/build_and_upload.sh --my_option ``` Thanks again! - Chris Antonellis