[Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

Nathaniel Smith njs at pobox.com
Wed Nov 11 04:04:46 EST 2015

On Tue, Nov 10, 2015 at 11:27 PM, Robert Collins
<robertc at robertcollins.net> wrote:
> On 11 November 2015 at 19:49, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On 11 November 2015 at 16:19, Robert Collins <robertc at robertcollins.net> wrote:
> ...>> pip is going to be invoking a CLI *no matter what*. Thats a hard
>>> requirement unless Python's very fundamental import behaviour changes.
>>> Slapping a Python API on things is lipstick on a pig here IMO: we're
>>> going to have to downgrade any richer interface; and by specifying the
>>> actual LCD as the interface it is then amenable to direct exploration
>>> by users without them having to reverse engineer an undocumented thunk
>>> within pip.
>> I'm not opposed to documenting how pip talks to its worker CLI - I
>> just share Nathan's concerns about locking that down in a PEP vs
>> keeping *that* CLI within pip's boundary of responsibilities, and
>> having a documented Python interface used for invoking build systems.
> I'm also very wary of something that would be an attractive nuisance.
> I've seen nothing suggesting that a Python API would be anything but:
>  - it won't be usable [it requires the glue to set up an isolated
> context, which is buried in pip] in the general case

This is exactly as true of a command line API -- in the general case
it also requires the glue to set up an isolated context. People who go
ahead and run 'flit' from their global environment instead of in the
isolated build environment will experience exactly the same problems
as people who go ahead and import 'flit.build_system_api' in their
global environment, so I don't see how one is any more of an
attractive nuisance than the other?

AFAICT the main difference is that "setting up a specified Python
context and then importing something and exploring its API" is
literally what I do all day as a Python developer. Either way you have
to set stuff up, and then once you do, in the Python API case you get
stuff like tab completion, ipython introspection (? and ??), etc. for

>  - no matter what we do, pip can't benefit from it beyond the
> subprocess interface pip needs, because pip *cannot* import and use
> the build interface

Not sure what you mean by "benefit" here. At best this is an argument
that the two options have similar capabilities, in which case I would
argue that we should choose the one that leads to simpler and thus
more probably bug-free specification language.

But even this isn't really true -- the difference between them is that
either way you have a subprocess API, but with a Python API, the
subprocess interface that pip uses has the option of being improved
incrementally over time -- including, potentially, to take further
advantage of the underlying richness of the Python semantics. Sure,
maybe the first release would just take all exceptions and map them
into some text printed to stderr and a non-zero return code, and
that's all that pip would get. But if someone had an idea for how pip
could do better than this by, I dunno, encoding some structured
metadata about the particular exception that occurred and passing this
back up to pip to do something intelligent with it, they absolutely
could write the code and submit a PR to pip, without having to write a
new PEP.

> tl;dr - I think making the case that the layer we define should be a
> Python protocol rather than a subprocess protocol requires some really
> strong evidence. We're *not* dealing with the same moving parts that
> typical Python stuff requires.

I'm very confused and honestly do not understand what you find
attractive about the subprocess protocol approach. Even your arguments
above aren't really even trying to be arguments that it's good, just
arguments that the Python API approach isn't much better. I'm sure
there is some reason you like it, and you might even have said it but
I missed it because I disagreed or something :-). But literally the
only reason I can think of right now for why one would prefer the
subprocess approach is that it lets one remove 50 lines of "worker
process" code from pip and move them into the individual build
backends instead, which I guess is a win if one is focused narrowly on
pip itself. But surely there is more I'm missing?

(And even this is lines-of-code argument is actually pretty dubious --
right now your draft PEP is importing-by-reference an entire existing
codebase (!) for shell variable expansion in command lines, which is
code that simply doesn't need to exist in the Python API approach. I'd
be willing to bet that your approach requires more code in pip than
mine :-).)

>> However, I've now realised that we're not constrained even if we start
>> with the CLI interface, as there's still a migration path to a Python
>> API based model:
>> Now: documented CLI for invoking build systems
>> Future: documented Python API for invoking build systems, default
>> fallback invokes the documented CLI
> Or we just issue an updated bootstrap schema, and there's no fallback
> or anything needed.

Oh no! But this totally gives up the most brilliant part of your
original idea! :-)

In my original draft, I had each hook specified separately in the
bootstrap file, e.g. (super schematically):

  build-requirements = flit-build-requirements
  do-wheel-build = flit-do-wheel-build
  do-editable-build = flit-do-editable build

and you counterproposed that instead there should just be one line like

  build-system = flit-build-system

and this is exactly right, because it means that if some new
capability is added to the spec (e.g. a new hook -- like
hypothetically imagine if we ended up deferring the equivalent of
egg-info or editable-build-mode to v2), then the new capability just
needs to be implemented in pip and in flit, and then all the projects
that use flit immediately gain superpowers without anyone having to go
around and manually change all the bootstrap files in every project

But for this to work it's crucial that the pip<->build-system
interface have some sort of versioning or negotiation beyond the
bootstrap file's schema version.

>> So the CLI documented in the PEP isn't *necessarily* going to be the
>> one used by pip to communicate into the build environment - it may be
>> invoked locally within the build environment.
> No, it totally will be. Exactly as setup.py is today. Thats
> deliberate: The *new* thing we're setting out to enable is abstract
> build systems, not reengineering pip.
> The future - sure, someone can write a new thing, and the necessary
> capability we're building in to allow future changes will allow a new
> PEP to slot in easily and take on that [non trivial and substantial
> chunk of work]. (For instance, how do you do compiler and build system
> specific options when you have a CLI to talk to pip with)?

I dunno, that seems pretty easy? My original draft just suggested that
the build hook would take a dict of string-valued keys, and then we'd
add some options to pip like "--project-build-option foo=bar" that
would set entries in that dict, and that's pretty much sufficient to
get the job done. To enable backcompat you'd also want to map the old
--install-option and --build-option switches to add entries to some
well-known keys in that dict. But none of the details here need to be
specified, because it's up to individual projects/build-systems to
assign meaning to this stuff and individual build-frontends like pip
to provide an interface to it -- at the build-frontent/build-backend
interface layer we just need some way to pass through the blobs.

I admit that this is another case where the Python API approach is
making things trivial though ;-). If you want to pass arbitrary
user-specified data through a command-line API, while avoiding things
like potential namespace collisions between user-defined switches and
standard-defined switches, then you have to do much more work than
just say "there's another argument that's a dict".


Nathaniel J. Smith -- http://vorpus.org

More information about the Distutils-SIG mailing list