[Distutils] PEP 517 - specifying build system in pyproject.toml

Nick Coghlan ncoghlan at gmail.com
Tue May 23 11:20:47 EDT 2017


On 23 May 2017 at 22:41, Thomas Kluyver <thomas at kluyver.me.uk> wrote:
> On Tue, May 23, 2017, at 12:56 PM, Paul Moore wrote:
> Can I take a quick poll of what people following this topic think?
>
> Q1: Default encoding for captured build stdout/stderr
> a. UTF-8 (consistent, can represent any character)
> b. Locale default (convenient if backend runs subprocesses which produce
> output in the locale encoding)
>
> Q2: Handling unknown encodings from subprocesses
> a. Backend should ensure all output is valid in the target encoding
> (Q1), though it may not be accurate.
> b. Unknown output may be passed on as bytes without transcoding, so the
> frontend can e.g. dump it to a file.

Up to this point, I've been in favour of both 1b and 2b, since they're
the main options that allow a build backend to get itself out of the
way entirely and let the front-end deal with the problem rather than
having to figure out encoding issues for themselves. pip's already has
to deal with the "arbitrarily encoded data" problem for the current
setup.py invocation, and whatever solution is adopted there should
suffice for PEP 517 as well.

If PEP 426 taught me anything, it was that if you weren't planning to
write something yourself, and didn't have the budget to pay someone
else to write it for you, your best bet is to adhere as closely to the
status quo as you can while still incorporating the 100% essential
changes that you actually need. (A Zen of Python style aphorism for
that: "The right way and the easy way should be the same way")

To be honest, I still think that's likely to be the right way to go
for PEP 517, and will take some convincing that we're going to be able
to persuade future backend developers that personally couldn't care
less about encoding issues to adopt anything more complex.

However, I also realised that there's a potential third way to handle
this problem: design a Python level API that allows front ends to use
more structured data formats (e.g. JSON) for communication between the
frontend and their backend shim.

In particular, I'm thinking we could move the current
"config_settings" dict onto something like a "build context" object
that, *even in Python 2*, offers a Unicode "outstream" and
"errstream", which the backend is then expected to use rather than
writing to sys.stdout/err directly. That context could also provide a
Python 3 style "run()" API for subprocess invocation that implemented
the preferred stream handling behaviour for subprocess invocation
(including applying the "backslashreplace" error handler regardless of
version)

That way, instead of trying to hit build backend developers with a
fairly flimsy stick ("Thou shalt comply with the specification or some
other open source developers may say mildly disapproving things about
you on the internet"), we'd instead be offering them the easy way out
of letting the front-end provided build context deal with all the
messy encoding issues.

Taking that approach of just defining a helper API and expecting build
backends to either use it or emulate it gives us some quite attractive
properties:

- backends themselves deal entirely in Unicode, not bytes
- frontends get full control of the communication format used between
the frontend and its backend shim - they're not restricted to plain
text
- the Python 2/3 differences can be handled in the frontend CLI shims,
rather than every backend needing to do it
- we don't need to enshrine any particular encoding handling behaviour
in the spec, we can let it be a quality of implementation issue for
the front-end tools
- platform specific tools can make platform specific choices
- tools can adapt to new platforms without requiring a specification update
- tools can update their default behaviour as other considerations
change (e.g. the possible introduction of locale coercion and
PYTHONUTF8 mode in 3.7)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list