setup_requires: the obvious option(?)

Hi all,
The `setup_requires` option to `setup()` is well-known to suffer from multiple issues. Most importantly, as it is a keyword argument to `setup()`, it appears too late for modules that may need to be imported for the build to occur (e.g., Cython, for which support must explicitly provided by setuptools itself rather than by letting Cython hook into it); additionally, there are various contorsions that people go to to avoid some `setup_requires` when not building the package (such as checking the value of `sys.argv`). `setup_requires` also uses `easy_install` rather than `pip`, but I do not see why this could not be fixed; let's focus on the first issue instead.
If `setup_requires` appears too late to be useful, the obvious(?) option is to move it earlier: provide a function, say, `setuptools.setup_requires()`, that should be called *before* `setup()` itself, e.g.:
from setuptools import setup, setup_requires setup_requires("numpy", needed_for=["build_ext"]) try: import numpy as np except ImportError: np = None setup(..., include_dirs=[np.get_include()] if np else [])
When `setup.py` is invoked, either directly or by pip, upon the call to `setup_requires()`, if `sys.argv[0]` is in the `needed_for` kwarg, and at least one requirement is missing, `setup_requires()` calls asks pip to install the required packages (similarly to ` https://bitbucket.org/dholth/setup-requires%60) in a temporary directory, and the whole Python process replaces itself (in the `os.execv()` sense) by a new call to `python setup.py` with this temporary directory prepended to the PYTHONPATH. In this new process, the arguments to `setup_requires()` are now available and we can proceed to the rest of `setup.py`.
I feel like this idea is natural enough that someone must already have come up with it... but I may be missing something :-)
Best, Antony

On Mon, Aug 29, 2016 at 7:29 PM, Antony Lee anntzer.lee@gmail.com wrote:
Hi all,
The `setup_requires` option to `setup()` is well-known to suffer from multiple issues. Most importantly, as it is a keyword argument to `setup()`, it appears too late for modules that may need to be imported for the build to occur (e.g., Cython, for which support must explicitly provided by setuptools itself rather than by letting Cython hook into it); additionally, there are various contorsions that people go to to avoid some `setup_requires` when not building the package (such as checking the value of `sys.argv`). `setup_requires` also uses `easy_install` rather than `pip`, but I do not see why this could not be fixed; let's focus on the first issue instead.
If `setup_requires` appears too late to be useful, the obvious(?) option is to move it earlier: provide a function, say, `setuptools.setup_requires()`, that should be called *before* `setup()` itself, e.g.:
from setuptools import setup, setup_requires setup_requires("numpy", needed_for=["build_ext"]) try: import numpy as np except ImportError: np = None setup(..., include_dirs=[np.get_include()] if np else [])
I mean this sort of already exists but it's spelled:
from setuptools import Distribution Distribution({'setup_requires': ['numpy'])
Granted it's non-obvious and doesn't have the needed_for flag, which I like. It's not entirely clear how needed_for would work though. For example, what if the package you're requiring provides the command that you need that package to run?
The same can be done by subclassing commands, and there can be some corner cases where that gets extra tricky (Cython comes to mind).
When `setup.py` is invoked, either directly or by pip, upon the call to `setup_requires()`, if `sys.argv[0]` is in the `needed_for` kwarg, and at least one requirement is missing, `setup_requires()` calls asks pip to install the required packages (similarly to `https://bitbucket.org/dholth/setup-requires%60) in a temporary directory, and the whole Python process replaces itself (in the `os.execv()` sense) by a new call to `python setup.py` with this temporary directory prepended to the PYTHONPATH. In this new process, the arguments to `setup_requires()` are now available and we can proceed to the rest of `setup.py`.
I feel like this idea is natural enough that someone must already have come up with it... but I may be missing something :-)
I'm glad you mentioned Daniel Holth's setup-requires hack. Although I haven't used it myself directly I generally like the concept.
Yeah, setup_requires is a mess, but I'd be skeptical of solving the problem by depending on any new features in setuptools :/
Best, Erik

On Tue, Aug 30, 2016 at 8:09 AM Erik Bray erik.m.bray@gmail.com wrote:
On Mon, Aug 29, 2016 at 7:29 PM, Antony Lee anntzer.lee@gmail.com wrote:
Hi all,
The `setup_requires` option to `setup()` is well-known to suffer from multiple issues. Most importantly, as it is a keyword argument to `setup()`, it appears too late for modules that may need to be imported
for
the build to occur (e.g., Cython, for which support must explicitly
provided
by setuptools itself rather than by letting Cython hook into it); additionally, there are various contorsions that people go to to avoid
some
`setup_requires` when not building the package (such as checking the
value
of `sys.argv`). `setup_requires` also uses `easy_install` rather than `pip`, but I do not see why this could not be fixed; let's focus on the first issue instead.
If `setup_requires` appears too late to be useful, the obvious(?) option
is
to move it earlier: provide a function, say,
`setuptools.setup_requires()`,
that should be called *before* `setup()` itself, e.g.:
from setuptools import setup, setup_requires setup_requires("numpy", needed_for=["build_ext"]) try: import numpy as np except ImportError: np = None setup(..., include_dirs=[np.get_include()] if np else [])
I mean this sort of already exists but it's spelled:
from setuptools import Distribution Distribution({'setup_requires': ['numpy'])
Granted it's non-obvious and doesn't have the needed_for flag, which I like. It's not entirely clear how needed_for would work though. For example, what if the package you're requiring provides the command that you need that package to run?
The same can be done by subclassing commands, and there can be some corner cases where that gets extra tricky (Cython comes to mind).
When `setup.py` is invoked, either directly or by pip, upon the call to `setup_requires()`, if `sys.argv[0]` is in the `needed_for` kwarg, and at least one requirement is missing, `setup_requires()` calls asks pip to install the required packages (similarly to `https://bitbucket.org/dholth/setup-requires%60
https://bitbucket.org/dholth/setup-requires) in a temporary directory, and
the whole Python process replaces itself (in the `os.execv()` sense) by a new call to `python setup.py` with this temporary directory prepended to
the
PYTHONPATH. In this new process, the arguments to `setup_requires()` are now available and we can proceed to the rest of `setup.py`.
I feel like this idea is natural enough that someone must already have
come
up with it... but I may be missing something :-)
I'm glad you mentioned Daniel Holth's setup-requires hack. Although I haven't used it myself directly I generally like the concept.
Yeah, setup_requires is a mess, but I'd be skeptical of solving the problem by depending on any new features in setuptools :/
In June I updated my setup_requires hack to be a PEP 518 implementation. It reads build-system.requires out of pyproject.toml and installs them into an isolated directory if necessary. The basic feature is 14 lines of code not counting blanks, and you use it by prepending the code to your setup.py. Once pip implements PEP 518 that code becomes a no-op.
Antony is right that the main problem with setup_requires is that it happens too late, or that you have to write a setuptools extension to use it. Most people do not know that it is possible to write a setuptools extension, let alone want to write one.

2016-08-30 5:08 GMT-07:00 Erik Bray erik.m.bray@gmail.com:
I mean this sort of already exists but it's spelled:
from setuptools import Distribution Distribution({'setup_requires': ['numpy'])
Granted it's non-obvious and doesn't have the needed_for flag, which I like. It's not entirely clear how needed_for would work though. For example, what if the package you're requiring provides the command that you need that package to run?
needed_for just textually checks sys.argv[1], and does so before the call to setup() itself happens, so that's not a problem.
The same can be done by subclassing commands, and there can be some
corner cases where that gets extra tricky (Cython comes to mind).
I personally don't like subclassing commands at all (it's not very composable -- what happens if two projects both attempt to subclass build_ext?). But even then, how is Cython an issue?
I'm glad you mentioned Daniel Holth's setup-requires hack. Although I
haven't used it myself directly I generally like the concept.
I am not really a fan of PEP518 in general. Basically, the idea of setup.py is that declarative languages are not sufficient to express a build system (and AFAICT this is always going to be the case for expressing, say, compiler flags for extensions), so I'd rather just accept that and stick everything in setup.py instead of adding more parameter files. What if someone wants dynamic build dependencies?
Yeah, setup_requires is a mess, but I'd be skeptical of solving the problem by depending on any new features in setuptools :/
This could also go into pip (perhaps a better solution now that pip is de-facto stdlib (via ensurepip))... we'll need something new somewhere anyways.
Best, Erik
Best,
Antony

On Tue, Aug 30, 2016, at 05:51 PM, Antony Lee wrote:
I am not really a fan of PEP518 in general. Basically, the idea of setup.py is that declarative languages are not sufficient to express a build system (and AFAICT this is always going to be the case for expressing, say, compiler flags for extensions), so I'd rather just accept that and stick everything in setup.py instead of adding more parameter files. What if someone wants dynamic build dependencies?
Dynamic build deps aren't precluded - the idea is that the build system can discover additional dependencies when it runs, while the static build- system field specifies just what's required to run the build system itself. However, the build system interface was split out into separate PEPs (517 & 516 are alternatives) to allow 518 to go forwards.
I take totally the opposite view: we should make as much metadata as possible declarative, even though we know we can't define a totally general build system with purely declarative information.

On Tue, Aug 30, 2016 at 1:13 PM Thomas Kluyver thomas@kluyver.me.uk wrote:
On Tue, Aug 30, 2016, at 05:51 PM, Antony Lee wrote:
I am not really a fan of PEP518 in general. Basically, the idea of setup.py is that declarative languages are not sufficient to express a build system (and AFAICT this is always going to be the case for expressing, say, compiler flags for extensions), so I'd rather just accept that and stick everything in setup.py instead of adding more parameter files. What if someone wants dynamic build dependencies?
Dynamic build deps aren't precluded - the idea is that the build system can discover additional dependencies when it runs, while the static build-system field specifies just what's required to run the build system itself. However, the build system interface was split out into separate PEPs (517 & 516 are alternatives) to allow 518 to go forwards.
I take totally the opposite view: we should make as much metadata as possible declarative, even though we know we can't define a totally general build system with purely declarative information.
This comes up over and over again because we've been living with this system for long enough that the build script and metadata are together both in setup.py and in people's brains. But there is a whole lot of stuff that makes perfect sense in a static file. For example: name, version, packages, install_requires, extras_require, description, license, classifiers, keywords, author, url, entry_points. The only thing that would cause trouble is if the system had no available build script.

On Aug 30, 2016, at 2:32 PM, Daniel Holth dholth@gmail.com wrote:
name, version, packages, install_requires, extras_require, description, license, classifiers, keywords, author, url, entry_points.
Out of these, a number of them are regularly dynamic for people’s setup.py as is. The version number is often dynamically computed based on things like git tags, packages can be computed based on Python version, install_requires and extras_requires regularly get computed based on things like Python version, OS, etc (although environment markers eases some of this) but also things like “We support Numpy >= for building from source, but once you’ve built from source you only support Numpy >= VERSION_YOU_BUILT_AGAINST”.
Outside of “name”, it’s not entirely unreasonable to find reasons why a lot of things need to be dynamic. Although there’s a difference in what needs to be dynamic when pulling from a VCS and when pulling from a sdist, and currently there’s not really a whole lot of difference in terms of capability or how they are handled.
— Donald Stufft

On Tue, Aug 30, 2016 at 4:06 PM Donald Stufft donald@stufft.io wrote:
On Aug 30, 2016, at 2:32 PM, Daniel Holth dholth@gmail.com wrote:
name, version, packages, install_requires, extras_require, description, license, classifiers, keywords, author, url, entry_points.
Out of these, a number of them are regularly dynamic for people’s setup.py as is. The version number is often dynamically computed based on things like git tags, packages can be computed based on Python version, install_requires and extras_requires regularly get computed based on things like Python version, OS, etc (although environment markers eases some of this) but also things like “We support Numpy >= for building from source, but once you’ve built from source you only support Numpy >= VERSION_YOU_BUILT_AGAINST”.
Outside of “name”, it’s not entirely unreasonable to find reasons why a lot of things need to be dynamic. Although there’s a difference in what needs to be dynamic when pulling from a VCS and when pulling from a sdist, and currently there’s not really a whole lot of difference in terms of capability or how they are handled.
Of course pip continues to call egg_info before trusting the metadata from any sdist and 90k pypi projects say this isn't changing. Once you need dynamic static metadata, madness.
In other systems I've worked on I sometimes have make-like rules that automatically rebuild static metadata depending on other files, like copying a version number between a .json and an .xml file - reprogramming the system that uses the .xml file is not an option. For example a rule could watch certain files in the .git directory to regenerate a version number automatically as part of the build if .git changed, and do nothing if the .git directory was absent per a tarball dist. This works pretty well. Once you have a system that's easy to customize with make-like rules there are all sorts of trivial build or housekeeping tasks you might decide to do which would never be considered in a more difficult to customize system.

On 31 August 2016 at 07:04, Daniel Holth dholth@gmail.com wrote:
In other systems I've worked on I sometimes have make-like rules that automatically rebuild static metadata depending on other files, like copying a version number between a .json and an .xml file - reprogramming the system that uses the .xml file is not an option. For example a rule could watch certain files in the .git directory to regenerate a version number automatically as part of the build if .git changed, and do nothing if the .git directory was absent per a tarball dist. This works pretty well. Once you have a system that's easy to customize with make-like rules there are all sorts of trivial build or housekeeping tasks you might decide to do which would never be considered in a more difficult to customize system.
CPython's own build process takes this to extremes by sometimes bootstrapping a version of Python without an import system in order to refreeze importlib before continuing with building the normal version. Argument Clinic (which generates C function argument handling preambles from specially formatted comments) similarly needs a pre-existing Python build in order to run.
However, we successfully hide that complexity from folks that just want to build their own Python from source by checking in the generated files.
Similarly, it wouldn't astonish me if we eventually see an emergent practice of people writing pyproject.toml.in files for complex projects, in order to move some particular forms of complexity away from build time and towards development time - this would be a similar practice to folks using autoconf to generate a project's C Makefile.
Party of the beauty of the pyproject.toml format is that as publication and installation system maintainers, we don't need to care about this problem beyond ensuring that "maintain it by hand" is a viable and fully supported option for the majority of projects - as long as the resulting file gets checked in, folks are free to autogenerate from another data source if they choose to do so.
Cheers, Nick.

Similarly, it wouldn't astonish me if we eventually see an emergent practice of people writing pyproject.toml.in files for complex projects, in order to move some particular forms of complexity away from build time and towards development time - this would be a similar practice to folks using autoconf to generate a project's C Makefile.
This actually formulates much better than I could have done the reasons why I dislike PEP518: it's only going to lead to reinventing the wheel (AKA autoconf, which is a pretty big wheel to reinvent). Antony

On 2 September 2016 at 13:30, Antony Lee anntzer.lee@gmail.com wrote:
Similarly, it wouldn't astonish me if we eventually see an emergent practice of people writing pyproject.toml.in files for complex projects, in order to move some particular forms of complexity away from build time and towards development time - this would be a similar practice to folks using autoconf to generate a project's C Makefile.
This actually formulates much better than I could have done the reasons why I dislike PEP518: it's only going to lead to reinventing the wheel (AKA autoconf, which is a pretty big wheel to reinvent).
Unlike autoconf, we don't need to support building arbitrary C/C++ projects - rather, we just want people to have a backend independent way to tell Python-centric toolchains how to invoke their *existing* build system (whether that's autoconf/make, CMake, Scons, waf, Meson, yotta, etc).
It's the current *lack* of that ability to readily integrate with existing build tools (whether written in Python or not) that prompts people to reinvent the world.
Defining and support PEP 518 means that a possible future workflow for autoconf using projects would be something like:
$ ./configure && make pyproject.toml && pip install -e .
In that kind of scenario, the "tell Python tools how to build our Python bindings" file becomes just another output of a project's existing build system, which is entirely feasible with a well documented static format, but impractical with the current underspecified and underdocumented setup.py based approach.
Cheers, Nick.
participants (6)
-
Antony Lee
-
Daniel Holth
-
Donald Stufft
-
Erik Bray
-
Nick Coghlan
-
Thomas Kluyver