[Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

Nathaniel Smith njs at pobox.com
Wed Nov 11 14:19:44 EST 2015

On Wed, Nov 11, 2015 at 4:29 AM, Donald Stufft <donald at stufft.io> wrote:
> On November 11, 2015 at 4:05:11 AM, Nathaniel Smith (njs at pobox.com) wrote:
>> > But even this isn't really true -- the difference between them
>> is that
>> either way you have a subprocess API, but with a Python API, the
>> subprocess interface that pip uses has the option of being improved
>> incrementally over time -- including, potentially, to take
>> further
>> advantage of the underlying richness of the Python semantics.
>> Sure,
>> maybe the first release would just take all exceptions and map
>> them
>> into some text printed to stderr and a non-zero return code, and
>> that's all that pip would get. But if someone had an idea for how
>> pip
>> could do better than this by, I dunno, encoding some structured
>> metadata about the particular exception that occurred and passing
>> this
>> back up to pip to do something intelligent with it, they absolutely
>> could write the code and submit a PR to pip, without having to write
>> a
>> new PEP.
> I think I prefer a CLI based approach (my suggestion was to remove the formatting/interpolation all together and just have the file include a list of things to install, and a python module to invoke via ``python -m <thing provided by user>``).
> The main reason I think I prefer a CLI based approach is that I worry about the impedance mismatch between the two systems. We’re not actually going to be able to take advantage of Python’s plethora of types in any meaningful capacity because at the end of the day the bulk of the data is either naturally a string or as we start to allow end users to pass options through pip into the build system, we have no real way of knowing what the type is supposed to be other than the fact we got it as a CLI flag. How does a user encode something like “pass an integer into this value in the build system?” on the CLI in a generic way? I can’t think of any way which means that any boundary code in the build system is going to need to be smart enough to handle an array of arguments that come in via the user typing something on the CLI. We have a wide variety of libraries to handle that case already for building CLI apps but we do not have a wide array of libraries handling it for a Python API. It will have to be manually encoded for each and every option that the build system supports.

You're overcomplicating things :-). The solution to this problem is
just "pip's UI only allows passing arbitrary strings as option values,
so build backends had better deal with it". That's what we'd
effectively be doing anyway in the CLI approach.

> My other concern is that it introduces another potential area for mistake that is a bit harder to test. I don’t believe that any sort of “worker.py” script is ever going to be able to handle arbitrary Python values coming back as a return value from a Python script. Whatever serialization we use to send data back into the main pip process (likely JSON) will simply choke and cause an error if it encounters a type it doesn’t know how to serialize. However this error case will only happen when the build system is being invoked by pip, not when it is being invoked “naturally” in the build system’s unit tests. By forcing build tool authors to write a CLI interface, we push the work of “how do I serialize my internal data structures” down onto them instead of making it some implicit piece of code that pip needs to work.

I think this is another issue that isn't actually a problem. Remember,
we don't need to support translating arbitrary Python function calls
across process boundaries; there will be a fixed, finite set of
methods that we need to support, and those methods' semantics will be
defined in a PEP.

So e.g., if the PEP says that build backends should define a method like this:

    def build_requirements(self, build_options):
        """Calculate the dynamic portion of the build-requirements.

        :param build_options: The build options dictionary.
        :returns: A list of strings, where each string is a PEP XX
requirement specifier.

then our IPC mechanism doesn't need to be able to handle arbitrary
types as return values, it needs to be able to handle a list of
strings. Which that sketch I sent does handle, so we're good. And the
build tool's unit tests will be checking that it returns a list of
strings, because... that's what unit tests do, they validate that
methods implement the interface that they're defined to implement :-).
So this is a non-problem -- we just have to make sure when we define
the various method interfaces in the PEP that we don't have any
methods that return arbitrary complicated Python types. Which we
weren't going to be tempted to do anyway.


Nathaniel J. Smith -- http://vorpus.org

More information about the Distutils-SIG mailing list