On Sun, Nov 8, 2015 at 5:28 PM, Robert Collins
+The use of a command line API rather than a Python API is a little +contentious. Fundamentally anything can be made to work, and Robert wants to +pick something thats sufficiently lowest common denominator that +implementation is straight forward on all sides. Picking a CLI for that makes +sense because all build systems will need a CLI for end users to use anyway.
I agree that this is not terribly important, and anything can be made to work. Having pondered it all for a few more weeks though I think that the "entrypoints-style" interface actually is unambiguously better, so let me see about making that case. What's at stake? ---------------------- Option 1, as in Robert's PEP: The build configuration file contains a string like "flit --dump-build-description" (or whatever), which names a command to run, and then a protocol for running this command to get information on the actual build system interface. Build operations are performed by executing these commands as subprocesses. Option 2, my preference: The build configuration file contains a string like "flit:build_system_api" (or whatever) which names a Python object accessed like import flit flit.build_system_api (This is the same syntax used for naming entry points.) Which would then have attributes and methods describing the actual build system interface. Build operations are performed by calling these methods. Why does it matter? ---------------------------- First, to be clear: I think that no matter which choice we make here, the final actual execution path is going to end up looking very similar. Because even if we go with the entry-point-style Python hooks, the build frontends like pip will still want to spawn a child to do the actual calls -- this is important for isolating pip from the build backend and the build backend from pip, it's important because the build backend needs to execute in a different environment than pip itself, etc. So no matter what, we're going to have some subprocess calls and some IPC. The difference is that in the subprocess approach, the IPC machinery is all written into the spec, and build frontends like pip implement one half while build backends implement the other half. In the Python API approach, the spec just specifies the Python calling conventions, and both halves of the IPC code live are implemented inside each build backend. Concretely, the way I imagine this would work is that pip would set up the build environment, and then it would run build-environment/bin/python path/to/pip-worker-script.py <args> where pip-worker-script.py is distributed as part of pip. (In simple cases it could simply be a file inside pip's package directory; if we want to support execution from pip-inside-a-zip-file then we need a bit of code to unpack it to a tempfile before executing it. Creating a tempfile is not a huge additional burden given that by the time we call build hooks we will have already created a whole temporary python environment...) In the subprocess approach, we have to write a ton of text describing all the intricacies of IPC. We have to specify how the command line gets split (or is it passed to the shell?), and specify a JSON-based protocol, and what happens to stdin/stdout/stderr, and etc. etc. In the Python API approach, we still have to do all the work of figuring these things out, but they would live inside pip's code, instead of in a PEP. The actual PEP text would be much smaller. It's not clear which approach leads to smaller code overall. If there are F frontends and B backends, then in the subprocess approach we collectively have to write F+B pieces of IPC code, and in the Python API approach we collectively have to write 2*F pieces of IPC code. So on this metric the Python API is a win if F < B, which would happen if e.g. everyone ends up using pip for their frontend but with lots of different backends, which seems plausible? But who knows. But now suppose that there's some bug in that complicated IPC protocol (which I would rate as about a 99.3% likelihood in our first attempt, because cross-platform compatible cross-process IPC is super annoying and fiddly). In the subprocess approach, fixing this means that we need to (a) write a PEP, and then (b) fix F+B pieces of code simultaneously on some flag day, and possibly test F*B combinations for correct interoperation. In the Python API approach, fixing this means patching whichever frontend has the bug, no PEPs or flag days necessary. In addition, the ability to evolve the two halves of the IPC channel together allows for better efficiency. For example, in Robert's current PEP there's some machinery added that hopes to let pip cache the result of the "--dump-build-description" call. This is needed because in the subprocess approach, the minimum number of subprocess calls you need to do something is two: one to ask what command to call, and a second to actually execute the command. In the python API approach, you can just go ahead and spawn a subprocess that knows what method it wants to call, and it can locate that method and then call it in a single shot, thus avoiding the need for an error-prone caching scheme. And the flexibility also helps in the face of future changes, too. Like, suppose that we start out with a do_build hook, and then later add a do_build2 hook that takes an extra argument or something, and pip wants to call do_build2 if it exists, and fall back on do_build otherwise. In the subprocess approach, you have to get the build description, check which hooks are provided, and then once you've decided which one you want to call you can spawn a second subprocess to do that. In the python API approach, pip can move this fallback logic directly into its hook-calling worker. (If it wants to.) So it still avoids the extra subprocess call. Finally, I think that it probably is nicer for pip to bite the bullet and take on more of the complexity budget here in order to make things simpler for build backends, because pip is already a highly complex project that undergoes lots of scrutiny from experts, which is almost certainly not going to be as true for all build backends. And the Python API approach is dead simple to explain and implement for the build backend side. I understand that the pip devs who are reading this might disagree, which is why I also wrote down the (IMO) much more compelling arguments above :-). But hey, still worth mentioning... -n -- Nathaniel J. Smith -- http://vorpus.org