On Nov 11, 2015 5:30 AM, "Paul Moore" <p.f.moore@gmail.com> wrote:
>
> On 10 November 2015 at 22:44, Nathaniel Smith <njs@pobox.com> wrote:
> > "Stdin is unspecified, and stdout/stderr can be used for printing
> > status messages, errors, etc. just like you're used to from every
> > other build system in the world."
>
> This is over simplistic.
>
> We have real-world requirements from users of pip that they *don't*
> want to see all of the progress that the various build tools invoke.
> That is not something we can ignore. We also have some users saying
> they want access to all of the build tool output. And we also have a
> requirement for progress reporting.

Have you tried current dev versions of pip recently? The default now is to suppress the actual output but for progress reporting to show a spinner that rotates each time a line of text would have been printed. It's low tech but IMHO very effective. (And obviously you can also flip a switch to either see all or nothing of the output as well, or if that isn't there now if books really be added.) So I kinda feel like these are solved problems.

> Taking all of those requirements into account, pip *has* to have some
> level of control over the output of a build tool - with setuptools at
> the moment, we have no such control (other than "we may or may not
> show the output to the user") and that means we struggle to
> realistically satisfy all of the conflicting requirements we have.
>
> So we do need much better defined contracts over stdin, stdout and
> stderr, and return codes. This is true whether or not the build system
> is invoked via a Python API or a CLI.

Even if you really do want to define a generic structured system for build progress reporting (it feels pretty second-systemy to me), then in the python api approach there are better options than trying to define a specific protocol on stdout.

Guaranteeing a clean stdout/stderr is hard: it means you have to be careful to correctly capture and process the output of every child you invoke (e.g. compilers), and deal correctly with the tricky aspects of pipes (deadlocks, sigpipe, ...). And even then you can get thwarted by accidentally importing the wrong library into your main process, and discovering that it writes directly to stdout/stderr on some error condition. And it may or may not respect your resetting of sys.stdout/sys.stderr at the python level. So to be really reliable the only thing to do is to create some pipes and some threads to read the pipes and do the dup2 dance (but not everyone will actually do this, they'll just accept corrupted output on errors) and ugh, all of this is a huge hassle that massively raises the bar on implementing simple build systems.

In the subprocess approach you don't really have many options; if you want live feedback from a build process then you have to get it somehow, and you can't just say "fine part of the protocol is that we use fd 3 for structured status updates" because that doesn't work on windows.

In the python api approach, we have better options, though. The way I'd do this is to define some of progress reporting abstract interface, like

class BuildUpdater:
      # pass -1 for "unknown"
      def set_total_steps(self, n):
          pass

      # if total is unknown, call this repeatedly to say "something's happening"
      def set_current_step(self, n):
          pass

def alert_user(self, message):
pass

And methods like build_wheel would accept an object implementing this interface as an argument. Stdout/stderr keep the same semantics as they have today; this is a separate, additional channel.

And then a build frontend could decide how it wants to actually implement this interface. A simple frontend that didn't want to implement fancy UI stuff might just have each of those methods print something to stderr to be captured along with the rest of the chatter. A fancier frontend like pip could pick whichever ipc mechanism they like best and implement that inside their worker. (E.g., maybe on POSIX we use fd 3, and on windows we do incremental writes to a temp file, or use a named pipe. Or maybe we prefer to stick to using stdout for pip<->worker communication, and the worker would take the responsibility of robustly redirecting stdout via dup2 before invoking the actual build hook. There are lots of options; the beauty of the approach, again, is that we don't have to pick one now and write it in stone.)

-n