The manylinux project does not specify a way to record "environment set up" information required to build wheels from certain packages.

Some packages require that the stock manylinux environment be modified by installing system dependencies, setting environment variables, modifying the filesystem etc prior to building, or the building and bundling will fail. Currently, package authors capture this information in project specific documentation, or not at all.

In the event that a wheel is not provided by the author, consumers that want a compiled distribution (users focused on reproducible builds, build efficiency) are required to locate this information in project specific documentation or do research on the internals of the package to gather this information themselves. Currently, package consumers can not easily modify or contribute to this data as it is not standardized.

What are everyone's thoughts about adding a section to pyproject.toml to capture this information? This allows package consumers to easily find and contribute back to this data, and programs like installers, builders etc can read this data as well. Is this PEP-able or where could this best be contributed? I am interested in (and intend on) contributing to solving this problem if it is accepted as a good idea! Thank you for your input!

Something like this:

[manylinux1]
system_dependencies:
  - foo-1.0.0
environment_variables:
  - FOO=BAR
python_packages:
  - foo==1.0.0

Its interesting for me, but I see some problems.

System_dependencies: many linux have different names (example: apache2 vs httpd). Sometimes same dependency is sliced in multiple packages (example: python3.9 depends on python3.9-minimal and...). How many times have you see a machine with python but without idle or tkinter? And naming convention is a pain for build systems dependencies and runtime system dependencies. Fortunately we have containers and can run different linux flavours. And old version of theese flavour which is very important.

Environment variables: Usually used to inject configuration. My GOPATH env var is under my home. Maybe a env name can have a description, default value,... but not a fixed value, because is not easy to share with others.

To avoid system dependencies, some build tools like https://github.com/joerick/cibuildwheel are centered on CI suites. With a CI you can use windows, linux, osx,... without think in mount virtual machines (libvirt, virtualbox,...) or coordinate buildbots farms more or less well configured. Once a wheel is compiled, build dependencies are no needed more. Then, cibuildwheel uses other libraries like auditwheel to embed shared libraries into wheels. So runtime system dependencies are included in wheel too. Mostly (see limitations https://github.com/pypa/auditwheel#limitations).

So requirements shoud be somelike...

* base_system_dependency: links to a OCI container, for example (open specifications if need to choose a technology). This mark minimal version of clib compatibility.
* extra_base_system_repositories: We want modern libraries, with all patches, but running on old linux version to improve compatibility with clib.
* build_dependencies: they must exists on base_system_dependencies or extra_base_system_repositories.
** build_system_dependencies: things used to test, compile and verify build. For example patchelf used by auditwheel
** build_python_dependencies: libraries used to build things. Like auditwheel used to pack into wheels runtime dependencies.
* build_steps: Maybe even environment variables are a good idea if it will run in containers. It could include unit-tests, compilation, integration tests after compilation,...
* artifacts: wheels, rpm packages, deb packages,...
* runtime_dependencies: All things that can not be included in our generated artifacts.
** python_runtime_dependencies: The simple part
** system_runtime_dependencies: But for this section, system dependencies will not run containerized. Return to diffent names, available versions, incompatibilities,...

I think that it's possible generate metadatas describing build requirements. Even if it fails for automated construction are a guide "TL;DR" of how install it. But if it can serve to
generate automated constructions, repair metadatas will be easy and then automated construction will run. Then we'll need a pyproject2docker or pyproject2gitlabci or... Maybe this reference implementations should be needed to test specification. Metadatas should be used to avoid technological dependence in builds.

If it no ends in a PEP is a good project to fill a private-repo with wheels.

Regards,

Javi