[Distutils] PEP 517 - specifying build system in pyproject.toml

Thomas Kluyver thomas at kluyver.me.uk
Mon May 22 06:22:15 EDT 2017


I have made a PR against the PEP with my best take on the encoding
situation:
https://github.com/python/peps/pull/264/files

On Mon, May 22, 2017, at 11:19 AM, Paul Moore wrote:
> On 22 May 2017 at 10:56, Thomas Kluyver <thomas at kluyver.me.uk> wrote:
> > On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote:
> >> Require that build tools either send UTF-8 to the UI component, or write
> >> bytes to a file and call it a build output. I see no benefit in
> >> requiring both the build tool and the UI tool to guess what the text
> >> encoding is.
> >
> > I'm not proposing that the install tool should try to guess the
> > encoding, but I think a well written install tool shouldn't crash if the
> > build output doesn't match the encoding it expects. Even if the spec
> > says that the build output MUST be UTF-8 encoded, build tools can have
> > bugs, and you don't want want the install to fail just because the log
> > isn't correctly encoded.
> >
> > Hence, I think a 'SHOULD' is appropriate for this part of the spec:
> >
> > - To install tool authors, it is clear that they can display the output
> > as UTF-8 so long as they don't crash if it's invalid.
> > - To build tool authors, it's clear that they can't pass the buck to
> > install tool authors if output gets jumbled because it's not UTF-8.
> 
> I'd say that it's not so much just "well written" install tools. I'd
> say that install tools MUST NOT crash if build tool output isn't in
> the expected encoding. On the other hand, the encoding agreement
> implies that if build tools *do* send data in the correct encoding
> then they are entitled to expect that it will be displayed accurately
> to the end user.
> 
> Output can be garbled in two ways:
> 
> 1. The build tool does not (or cannot) ensure that its output is in
> the standard-mandated encoding.
> 2. The install tool cannot display the full range of characters
> representable in the standard-mandated encoding.
> 
> Neither of these should cause a failure. Well written install tools
> should warn in the case of (1) - "I have been passed data that I don't
> understand, I'll do my best to display it but can't guarantee the
> output won't be garbled". In the case of (2), though, that's "as
> expected" - if your OS settings mean you can't display certain
> characters, you shouldn't be surprised if your install tool replaces
> them with a placeholder.
> 
> On an implementation note, this boils down to something like the
> following in the install tool:
> 
>     # Step 1
>     try:
>         data = decode build output using STD_ENCODING
>     except UnicodeDecodeError:
>         warn "Data is not in expected encoding"
>         data = decode using STD_ENCODING with errors=<some form of
>         replacement>
> 
>     # Step 2
>     data = data.encode(MY_OUTPUT_ENCODING, errors=<some form of
> replacement>).decode(MY_OUTPUT_ENCODING)
> 
>     # We now have subprocess output that's safe to display if requested.
> 
> As a side note, I find step 2 "sanitise my string to ensure it can be
> safely output" to be a pretty common operation - possibly because
> Python's standard IO streams raise exceptions on unicode errors - and
> I'm surprised there isn't a better way to spell it than the
> encode/decode pair above.
> 
> Paul


More information about the Distutils-SIG mailing list