[Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

Nathaniel Smith njs at pobox.com
Wed Nov 11 13:38:14 EST 2015

On Nov 11, 2015 5:30 AM, "Paul Moore" <p.f.moore at gmail.com> wrote:
> On 10 November 2015 at 22:44, Nathaniel Smith <njs at pobox.com> wrote:
> > "Stdin is unspecified, and stdout/stderr can be used for printing
> > status messages, errors, etc. just like you're used to from every
> > other build system in the world."
> This is over simplistic.
> We have real-world requirements from users of pip that they *don't*
> want to see all of the progress that the various build tools invoke.
> That is not something we can ignore. We also have some users saying
> they want access to all of the build tool output. And we also have a
> requirement for progress reporting.

Have you tried current dev versions of pip recently? The default now is to
suppress the actual output but for progress reporting to show a spinner
that rotates each time a line of text would have been printed. It's low
tech but IMHO very effective. (And obviously you can also flip a switch to
either see all or nothing of the output as well, or if that isn't there now
if books really be added.) So I kinda feel like these are solved problems.

> Taking all of those requirements into account, pip *has* to have some
> level of control over the output of a build tool - with setuptools at
> the moment, we have no such control (other than "we may or may not
> show the output to the user") and that means we struggle to
> realistically satisfy all of the conflicting requirements we have.
> So we do need much better defined contracts over stdin, stdout and
> stderr, and return codes. This is true whether or not the build system
> is invoked via a Python API or a CLI.

Even if you really do want to define a generic structured system for build
progress reporting (it feels pretty second-systemy to me), then in the
python api approach there are better options than trying to define a
specific protocol on stdout.

Guaranteeing a clean stdout/stderr is hard: it means you have to be careful
to correctly capture and process the output of every child you invoke (e.g.
compilers), and deal correctly with the tricky aspects of pipes (deadlocks,
sigpipe, ...). And even then you can get thwarted by accidentally importing
the wrong library into your main process, and discovering that it writes
directly to stdout/stderr on some error condition. And it may or may not
respect your resetting of sys.stdout/sys.stderr at the python level. So to
be really reliable the only thing to do is to create some pipes and some
threads to read the pipes and do the dup2 dance (but not everyone will
actually do this, they'll just accept corrupted output on errors) and ugh,
all of this is a huge hassle that massively raises the bar on implementing
simple build systems.

In the subprocess approach you don't really have many options; if you want
live feedback from a build process then you have to get it somehow, and you
can't just say "fine part of the protocol is that we use fd 3 for
structured status updates" because that doesn't work on windows.

In the python api approach, we have better options, though. The way I'd do
this is to define some of progress reporting abstract interface, like

  class BuildUpdater:
      # pass -1 for "unknown"
      def set_total_steps(self, n):

      # if total is unknown, call this repeatedly to say "something's
      def set_current_step(self, n):

      def alert_user(self, message):

And methods like build_wheel would accept an object implementing this
interface as an argument. Stdout/stderr keep the same semantics as they
have today; this is a separate, additional channel.

And then a build frontend could decide how it wants to actually implement
this interface. A simple frontend that didn't want to implement fancy UI
stuff might just have each of those methods print something to stderr to be
captured along with the rest of the chatter. A fancier frontend like pip
could pick whichever ipc mechanism they like best and implement that inside
their worker. (E.g., maybe on POSIX we use fd 3, and on windows we do
incremental writes to a temp file, or use a named pipe. Or maybe we prefer
to stick to using stdout for pip<->worker communication, and the worker
would take the responsibility of robustly redirecting stdout via dup2
before invoking the actual build hook. There are lots of options; the
beauty of the approach, again, is that we don't have to pick one now and
write it in stone.)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20151111/6bbea225/attachment.html>

More information about the Distutils-SIG mailing list