[Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

Donald Stufft donald at stufft.io
Wed Nov 11 07:29:44 EST 2015


On November 11, 2015 at 4:05:11 AM, Nathaniel Smith (njs at pobox.com) wrote:
> > But even this isn't really true -- the difference between them  
> is that
> either way you have a subprocess API, but with a Python API, the  
> subprocess interface that pip uses has the option of being improved  
> incrementally over time -- including, potentially, to take  
> further
> advantage of the underlying richness of the Python semantics.  
> Sure,
> maybe the first release would just take all exceptions and map  
> them
> into some text printed to stderr and a non-zero return code, and  
> that's all that pip would get. But if someone had an idea for how  
> pip
> could do better than this by, I dunno, encoding some structured  
> metadata about the particular exception that occurred and passing  
> this
> back up to pip to do something intelligent with it, they absolutely  
> could write the code and submit a PR to pip, without having to write  
> a
> new PEP.

I think I prefer a CLI based approach (my suggestion was to remove the formatting/interpolation all together and just have the file include a list of things to install, and a python module to invoke via ``python -m <thing provided by user>``).

The main reason I think I prefer a CLI based approach is that I worry about the impedance mismatch between the two systems. We’re not actually going to be able to take advantage of Python’s plethora of types in any meaningful capacity because at the end of the day the bulk of the data is either naturally a string or as we start to allow end users to pass options through pip into the build system, we have no real way of knowing what the type is supposed to be other than the fact we got it as a CLI flag. How does a user encode something like “pass an integer into this value in the build system?” on the CLI in a generic way? I can’t think of any way which means that any boundary code in the build system is going to need to be smart enough to handle an array of arguments that come in via the user typing something on the CLI. We have a wide variety of libraries to handle that case already for building CLI apps but we do not have a wide array of libraries handling it for a Python API. It will have to be manually encoded for each and every option that the build system supports.

My other concern is that it introduces another potential area for mistake that is a bit harder to test. I don’t believe that any sort of “worker.py” script is ever going to be able to handle arbitrary Python values coming back as a return value from a Python script. Whatever serialization we use to send data back into the main pip process (likely JSON) will simply choke and cause an error if it encounters a type it doesn’t know how to serialize. However this error case will only happen when the build system is being invoked by pip, not when it is being invoked “naturally” in the build system’s unit tests. By forcing build tool authors to write a CLI interface, we push the work of “how do I serialize my internal data structures” down onto them instead of making it some implicit piece of code that pip needs to work.

The other reason I think a CLI approach is nicer is that it gives us a standard interface that we can us to have defined errors that the build system can omit. For instance if we wanted to allow the build system to indicate that it can’t do a build because it’s missing a mandatory C library, that would be trivial to do in a natural way for a CLI approach, we just define an error code and say that if the CLI exits with a 2 then we assume it’s missing a mandatory C library and we can take additional measures in pip to handle that case. If we use a Python API the natural way to signal an error like that is using an exception… but we don’t have any way to force a standard exception hierarchy on people. There is no “Missing C Library Exception” in Python so either we’d have to encode some numerical or string based identifier that we’ll inspect an exception for (like Exception().error_code) or we’ll need to make a mandatory runtime library that the build systems must utilize to get their exceptions from. Alternatively we could have the calling functions return exit codes as well just like a process boundary does, however that is also not natural in Python and is more natural in a language like C.

The main downside to the CLI approach is that it’s harder for the build system to send structured information back to the calling process outside of defined error code. However I do not believe that is particularly difficult since we can have it do something like send messages on stdout that are JSON encoded messages that pip can process and understand. I don’t think that it’s a requirement or even useful that the same CLI that end users would use to directly invoke that build system is the same one that pip would use to invoke that build system. So we wouldn’t need to worry about the fact that a bunch of JSON blobs being put on stdout isn’t very user friendly, because the user isn’t the target of these commands, pip is.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA




More information about the Distutils-SIG mailing list