[Distutils] PEP 517 - specifying build system in pyproject.toml

Nick Coghlan ncoghlan at gmail.com
Sun May 21 01:36:22 EDT 2017


On 21 May 2017 at 02:36, Steve Dower <steve.dower at python.org> wrote:
> On 20May2017 0820, Nick Coghlan wrote:
>>
>> Good point regarding the fact that the Windows 16-bit APIs only come
>> into play for interactive sessions (even in 3.6+), while for PEP 517
>> we're specifically interested in the 8-bit pipes used to communicate
>> with build subprocesses launched by an installation tool.
>
>
> I need to catch up on the PEP (and thanks Brett for alerting me to the
> thread), but this comment in particular cements the mental diagram I have
> right now:
>
> (build UI) <--> (build tool) <--> (compiler)
> ( Python ) <--> (  Python  ) <--> (anything)
>
> I'll probably read the PEP closely and see that this is entirely incorrect,
> but if it's right:
>
> * encoding for text between the build UI and build tool should just be
> specified once for all platforms (i.e. use UTF-8).
> * encoding for text between build tool and the compiler depends on the
> compiler

Alas, it isn't quite that simple. Let's take the current de facto standard case:


    (user console/CI build log) <-> pip <-> setup.py
(distutils/setuptools) <-> 3rd party tool

Key usability feature:

* when requested, informational messages from 3rd party tools SHOULD
be made available to the end user for debugging purposes

Ideal outcome:

* everything that makes it to the user console or CI build log is
readable by the end user

Essential requirement:

* encoding problems in informational messages emitted by 3rd party
tools MUST NOT cause the build to fail

Now, the easiest way to handle the essential requirement as the author
of an installation or build tool is to choose not to deal with it:
instead, you just treat the output from further downstream as opaque
binary data, and let the user console/CI build log layer deal with any
encoding problems as they see fit. You may end up with some build
failures that are a pain to debug because you're getting nonsense from
the build pipeline, but you won't fail your build *because* some
particular build tool emitted improperly encoded nonsense.

That all changes if we *require* UTF-8 on the link between the
installation tool (e.g. pip) and the build tool (e.g. setup.py). If we
do that:

* the installation tool can't just pass along build tool output to the
user console or CI build log any more, it has a nominal obligation to
try to interpret it as UTF-8
* the build tool (or build tool shim) can't just pass along 3rd party
tool output to the installation tool any more, it has a nominal
obligation to try to get it to emit UTF-8

Now, *particular* installation and build tools may want to strongly
encourage the use of UTF-8 in an effort to get closer to the ideal
outcome, but that isn't the key objective of PEP 517: the key
objective of PEP 517 is to make it easier to use *general purpose*
build systems that happen to be implemented in Python (like waf,
scons, and meson) to handle complex build scenarios, while also
allowing the use of simpler Python-only build systems (like flit) for
distribution of pure Python projects.

That said, the PEP *could* explicitly define a short list of
behaviours that we consider reasonable in an installation tool:

1. Treat the informational output from the build tool as an opaque binary stream
2. Treat the informational output from the build tool as a text stream
encoded using locale.getpreferredencoding(), and decode it using the
backslashreplace error handler
3. Treat the informational output from the build tool as a UTF-8
encoded text stream, and decode it using the backslashreplace error
handler

We'd just need to caveat the latter two options with the fact that
they'll give you a cryptic error message on Python 3.4 and earlier
(including Python 2):

>>> b"\xf0\x01\x02\x03".decode("utf-8", "backslashreplace")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ncoghlan/devel/py27/Lib/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
TypeError: don't know how to handle UnicodeDecodeError in error callback

I had to look that up on Stack Overflow myself, but what it's trying
to say is that until Python 3.5, "backslashreplace" only worked for
encoding, not for decoding.

That means that for earlier versions, you'd need to define your own
custom error handler as described in
http://stackoverflow.com/questions/25442954/how-should-i-decode-bytes-using-ascii-without-losing-any-junk-bytes-if-xmlch/25443356#25443356

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list