abstract build system approaches redux

This is my attempt to consolidate the three different proposals we have at the moment into a single set of design choices.
We have three draft designs - from Nathaniel (517), Donald (unnumberd) and mine (516).
For purposes of comparison, I've skipped all rationale here, and only listed the highlights that differ across the proposals. If you haven't read all three, you should before continuing.
My goal is to let us pick a final shape - and then we can shave the rough edges off of that shape to get a final PEP (or PEPs)
517: - Python interface to the system - build requires, optional wheel metadata (to directory on disk), build wheel. install editable commands
516: - CLI interface to the system - build requires, mandatory wheel metadata, build wheel, install editable commands
Donald's: - Python interface to the system - defines new 'source wheel' concept - source_metadata, source_wheel, binary_metadata, binary_wheel commands - 'develop' // editable not addressed currently - outputs directories on disk, rather than zipped up things
Ok, now here's my view on the differences...
Python vs CLI - I don't care enough to argue. I think a CLI is better in this space, as running commands is the lingua franca of CLI's; previously Donald has advocated for that as well, but his proposal is now a Python API. *shrug*. I want the BDFL-Delegate to choose.
516 and 517 are otherwise identical in all meaningful ways - there's plenty of stuff to bikeshed on (e.g. should wheel metadata be optional or mandatory), but I don't think it is crucial: we can always change this in future by issuing a new version. Its not free, but if we can't agree, we can at least find out later. Donalds and 516 both have the ability to get metadata out explicitly as a mandatory thing, so I think we should make it mandatory and weaken it in future if e.g. flit eventually come back saying its a burden. [I don't believe it will be].
File formats - BDFL-Delegate can pick. I so don't care :).
New source format. I'm quite strongly against this at the moment: Building from VCS is now a fairly major thing in the community, and we can't mandate new URLs for all sources. We're going to thus be forcing a change to all tooling (e.g. Debian and Fedora and pip and others) that know how to build things from an arbitrary VCS url. A new source extension in pypi doesn't reduce the impact of that at all, and the necessary code to handle that will also handle sdists's on pypi that have the new config file and may be missing a setup.py. If there is a new source extension in pypi, we should expect that folk using e.g. flit will *not* upload old style sdists to pypi - they don't at the moment, and the objection to doing so will *remain*. -> We don't save any grief or friction for the adoption of the new thing, nor enable new build systems any more easily. A shim to the new shiny would enable current sdists for anything adopting the new thing in exactly the same manner as it would if we reuse the extension. I can see an argument for a new iteration of the sdist format for better metadata, and to be able to tell what metadata is trustable etc, but I don't see that as related to the problem of enabling third party build systems - and since all the easy cases will be solved by binary wheels I think we're in rapidly diminishing territory here - if something can't upload a binary wheel, its quite likely to have complex dependencies - and we haven't solved the numpy ABI problem yet, so I'd like to actually focus on that well and solve it - and once solved see if we can retrofit it into the existing sdist format, or if a new one is actually needed.
Develop is essential to tackle IMO. We can either fully PEP define it now - and if so we should stop talking about this and start talking about that. If we don't have bandwidth to do that yet (and AFAICT we don't, based on IRC discussions and the headaches of interactions with the three different namespace package possibilities, virtualenvs and user installs....).
Outputting the wheel as an unzipped directly on disk would be a nice optimisation for big trees, I think, so perhaps we should apply that to 516/517 to reduce the set of things we're discussing.
-Rob
516: https://github.com/pypa/interoperability-peps/blob/master/pep-0516-build-sys... 517: https://github.com/pypa/interoperability-peps/blob/master/pep-0517-build-sys... Donalds: https://github.com/pypa/interoperability-peps/compare/master...dstufft:build...

I'm hoping to get some time to finish fleshing mine out this weekend. I've been unable to do much except in small spurts this last week due to some complications from a temporary filling after a root canal last week (it threw my bite off and I've been having jaw pain + feeling of drunkiness + extreme exhaustion). The temporary filling is coming out tomorrow so hopefully that will be cleared up by then.
Sent from my iPhone
On Mar 2, 2016, at 7:43 PM, Robert Collins robertc@robertcollins.net wrote:
We have three draft designs - from Nathaniel (517), Donald (unnumberd) and mine (516).

On 3 March 2016 at 10:43, Robert Collins robertc@robertcollins.net wrote:
Python vs CLI - I don't care enough to argue. I think a CLI is better in this space, as running commands is the lingua franca of CLI's; previously Donald has advocated for that as well, but his proposal is now a Python API. *shrug*. I want the BDFL-Delegate to choose.
I'm happy to let the discussion run for the other open questions you mention, but I'm now prepared to pronounce on this aspect based on one specific factor: I don't want to mandate that all future build systems for Python projects (whether front ends or back ends) must be written in Python.
If somebody comes up with an all-singing, all-dancing Python build system that happens to be written in Go, or Rust, or Haskell, or that they made by adapting an existing build system for another language ecosystem to have native support for also building Python packages, I'd like for the Python packaging ecosystem to be able to handle that without significant fuss.
Attaining that level of cross-language interoperability then necessarily means defining the formal build system abstraction interface at the CLI boundary, rather than as a Python API.
Defining a helper library/framework to make it *easier* to write new build systems in Python would still make sense, but that can be done in the usual way that any other library gains widespread acceptance: by being easier to use than a "DIY" approach when it comes to directly implementing the underlying interoperability specification.
Regards, Nick.

On Mar 3, 2016, at 8:19 AM, Nick Coghlan ncoghlan@gmail.com wrote:
On 3 March 2016 at 10:43, Robert Collins robertc@robertcollins.net wrote:
Python vs CLI - I don't care enough to argue. I think a CLI is better in this space, as running commands is the lingua franca of CLI's; previously Donald has advocated for that as well, but his proposal is now a Python API. *shrug*. I want the BDFL-Delegate to choose.
I'm happy to let the discussion run for the other open questions you mention, but I'm now prepared to pronounce on this aspect based on one specific factor: I don't want to mandate that all future build systems for Python projects (whether front ends or back ends) must be written in Python.
If somebody comes up with an all-singing, all-dancing Python build system that happens to be written in Go, or Rust, or Haskell, or that they made by adapting an existing build system for another language ecosystem to have native support for also building Python packages, I'd like for the Python packaging ecosystem to be able to handle that without significant fuss.
Attaining that level of cross-language interoperability then necessarily means defining the formal build system abstraction interface at the CLI boundary, rather than as a Python API.
Defining a helper library/framework to make it *easier* to write new build systems in Python would still make sense, but that can be done in the usual way that any other library gains widespread acceptance: by being easier to use than a "DIY" approach when it comes to directly implementing the underlying interoperability specification.
I'd like to push back against this, speaking as someone who was originally pro CLI:
I think that a Python API is actually better for one reason: introspection. I cannot think of a particularly great way to have a CLI based build tool *evolve* with new APIs that are not user facing without requiring end users to do something like mark "ok now my thing is X compatible" or without inventing some sort of protocol negotiation phase.
For instance, let's say that we say that we need an ``metadata`` API that can output metadata 1.x style metadata. That's great and right now we can just assume that it exists. However, let's say in the future we add a metadata 2.0 spec that isn't backwards compatible. Now how do we handle this? A new version of pip can't just assume that the ``metadata`` command is going to always return 2.x metadata now, because that would break all existing build tools so it needs some method to determine what capabilities a particular build system has.
I see a few options:
* If we're using a CLI based thing, we need to create a negotiation phase, which I think is a bad idea because it's kind of complicated and error prone. Looking at TLS or HTTP, there's more than one bug that exists because of this.
* If we're using a CLI based thing, we need to bake into the file what the capabilities of the build system is. Maybe using something like:
generate-metadata-2: true # OR metadata2-generation-command: ${PYTHON} -m generate-metdata
However, I think this is bad because it makes it harder to get automatic "wins" for new features because you have to convince every single author to go and modify their file to tell pip (or whatever) that this new feature is available as well as then needing to force a >= in their build dependencies because they mandating some new feature by baking it directly into their project.
* If we're using a Python based API, we can simply just say that there can be no backwards incompatible changes made to an API name once defined. If we want to create a new function that produces the new style metadata, then we can simply do a check like:
if hasattr(api, "metadata2"): # Do Metadata 2.x Code else: # Do Metadata 1.x Code
This not only allows all projects to automatically start using this new feature as soon as a build tool implements it, getting automatic wins, but it also makes it trivial for a project to support both old and new versions of pip, since there will be a different API name for a backwards incompatible change. Old pip would just use the old code, and new pip would use the new code.
The only real benefits I can see to using a CLI based over a Python based are:
* We move some code out of pip into each individual build backend, making it clearer the process boundaries. This is a nice thing, and it is the primary reason I wanted a CLI based API. However I don't think it outweighs the ability to introspect what API methods are available.
* We make it easier to use a non Python based build system. This is also a nice thing, however I don't think it should be a major decider in what API we provide. Any reasonable build system is going to have to be available via ``pip install`` in some fashion, so even if you write your build system in Go or Rust, you're going to have to create a Python package for it anyways, and if you're doing that, adding a tiny shim is pretty trivial, something like:
import os.path import subprocess
BIN = os.path.join(os.path.dirname(__file__), "mytool")
def some_api_function(*args, **kwargs): flags = convert_args_kwargs_to_flags(*args, **kwargs) subprocess.run(BIN, *flags, check=True)
I don't believe it to be a substantial burden to need to write a tiny wrapper if you're going to do something which I believe is going to be very unlikely.
In the end, I think this comes down that we shouldn't optimize for the least common case, at the expense of the ability to more easily evolve the API in the future.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Thu, 3 Mar 2016 08:44:56 -0500 Donald Stufft donald@stufft.io wrote:
I'd like to push back against this, speaking as someone who was originally pro CLI:
I think that a Python API is actually better for one reason: introspection. I cannot think of a particularly great way to have a CLI based build tool *evolve* with new APIs that are not user facing without requiring end users to do something like mark "ok now my thing is X compatible" or without inventing some sort of protocol negotiation phase.
I'll add that some build systems may have a non-trivial startup cost (for example conda-build sets up an isolated environment with well-defined binaries in it), therefore issuing several CLI commands can be significantly more costly (and/or difficult to optimize for) than issuing several API calls from the same single process invocation.
Regards
Antoine.

On 3 March 2016 at 23:44, Donald Stufft donald@stufft.io wrote:
We make it easier to use a non Python based build system. This is also a nice thing, however I don't think it should be a major decider in what API we provide. Any reasonable build system is going to have to be available via ``pip install`` in some fashion, so even if you write your build system in Go or Rust, you're going to have to create a Python package for it anyways, and if you're doing that, adding a tiny shim is pretty trivial, something like:
import os.path import subprocess
BIN = os.path.join(os.path.dirname(__file__), "mytool")
def some_api_function(*args, **kwargs): flags = convert_args_kwargs_to_flags(*args, **kwargs) subprocess.run(BIN, *flags, check=True)
I don't believe it to be a substantial burden to need to write a tiny wrapper if you're going to do something which I believe is going to be very unlikely.
Ah, you're right, I hadn't accounted for the fact the same shim that makes a non-Python tool installable as a build dependency could also handle the adaptation from a Python API to a CLI or FFI based approach, so putting the standardised interface in Python doesn't raise any insurmountable barriers to cross-language interoperability - they just move the additional complexity to the less common case.
Given that, I'm going to go back to reserving judgement on *all* of the points Robert mentioned, at least until you've had a chance to finish writing up your own proposal - the determining factor I thought I had found turned out not to be so determining after all :)
Regards, Nick.

Just to set expectations: this whole process seems stalled to me; I'm going to context switch and focus on things that can move forward. Someone please ping me when its relevant to put effort in again :).
-Rob
On 4 March 2016 at 03:11, Nick Coghlan ncoghlan@gmail.com wrote:
On 3 March 2016 at 23:44, Donald Stufft donald@stufft.io wrote:
We make it easier to use a non Python based build system. This is also a nice thing, however I don't think it should be a major decider in what API we provide. Any reasonable build system is going to have to be available via ``pip install`` in some fashion, so even if you write your build system in Go or Rust, you're going to have to create a Python package for it anyways, and if you're doing that, adding a tiny shim is pretty trivial, something like:
import os.path import subprocess
BIN = os.path.join(os.path.dirname(__file__), "mytool")
def some_api_function(*args, **kwargs): flags = convert_args_kwargs_to_flags(*args, **kwargs) subprocess.run(BIN, *flags, check=True)
I don't believe it to be a substantial burden to need to write a tiny wrapper if you're going to do something which I believe is going to be very unlikely.
Ah, you're right, I hadn't accounted for the fact the same shim that makes a non-Python tool installable as a build dependency could also handle the adaptation from a Python API to a CLI or FFI based approach, so putting the standardised interface in Python doesn't raise any insurmountable barriers to cross-language interoperability - they just move the additional complexity to the less common case.
Given that, I'm going to go back to reserving judgement on *all* of the points Robert mentioned, at least until you've had a chance to finish writing up your own proposal - the determining factor I thought I had found turned out not to be so determining after all :)
Regards, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 15 March 2016 at 18:34, Robert Collins robertc@robertcollins.net wrote:
Just to set expectations: this whole process seems stalled to me; I'm going to context switch and focus on things that can move forward. Someone please ping me when its relevant to put effort in again :).
I'm not sure what is needed to get this process moving again, but when it does, it would probably be worth reviewing https://github.com/pypa/pip/issues/562
The issue there is that --editable and --target don't work well together in pip, if I understand the details it's because setuptools needs extra information to know how to do "setup.py develop" in the face of --target. The question for this thread is, does the build system interface need to know what *other* options get passed to pip as part of deciding what the correct "develop" command should be (or indeed any of the other commands)?
Note that I don't have any vested interest in the specific issue in question, and "this is a corner case we shouldn't try to support going forward" is a perfectly acceptable answer as far as I'm concerned. But I thought it was worth raising in case it points to a limitation of the build system command discovery process.
Paul

On 3 March 2016 at 13:44, Donald Stufft donald@stufft.io wrote:
- If we're using a CLI based thing, we need to create a negotiation phase, which I think is a bad idea because it's kind of complicated and error prone. Looking at TLS or HTTP, there's more than one bug that exists because of this.
Additionally, it seems to me that a negotiation means additional round-trips to the build system. Which means extra subprocess calls. On Windows in particular, the cost of a subprocess call is not that cheap, so this could result in increased build times.
I understand that the actual build steps will likely always involve a subprocess, for isolation purposes. But avoiding subprocess calls for "admin" issues like "what version of the protocol do you support?" is a worthwhile goal.
(Beyond noting the above as a data point, I don't really have a strong opinion on a CLI vs Python interface, though).
Paul
participants (5)
-
Antoine Pitrou
-
Donald Stufft
-
Nick Coghlan
-
Paul Moore
-
Robert Collins