[Distutils] PEP 517 - specifying build system in pyproject.toml

Tue May 23 06:04:27 EDT 2017

On 23 May 2017 at 09:56, Thomas Kluyver <thomas at kluyver.me.uk> wrote:
> I may have missed it, but has anyone proposed what it should do if it
> wants to send characters which can't be encoded in the locale encoding?

No, it's not been mentioned - the focus has been on running build
tools like a compiler. Best answer I can give is to use a
(backslash)replace error handler. I agree this is suboptimal, but see
below.

> Paths on Windows are handled natively as UTF-16, as I understand it, so
> it's entirely possible for them to contain characters which can't be
> represented in, say, CP1252.

Agreed. In practice, the vast bulk of the issues reported for pip seem
to be to do with filename characters or localised messages using the
ANSI/OEM codepages, though. But I agree that in theory this is an
issue.

> Given this, and the workarounds Nick has pointed out are necessary for
> systems where the locale thinks it's ASCII, I still think that
> specifying "UTF-8" is a better option than trying to work with locale
> encodings. We're building a new spec for new tools in 2017, let's not
> prolong the pain of platform-dependent default encodings further.

However, if we do this then we have a situation where existing build
tools (compilers, etc) that we have to support still use platform
dependent encodings. That's a reality that we can't wish away. And the
majority of real-life issues reported on pip are with compilation
errors. So do we require backends that run these tools to ensure that
they transcode the output, or do we risk significant output
corruption, because (essentially) every high-bit character in the
compiler output will be replaced as it's invalid UTF-8?

I agree 100% that UTF-8 is in theory the right thing. My focus is on
the practical aspects of minimising the risks of repeating the sorts
of actual issues that we have seen in the past on pip, though, and
"don't require backends that run compilers to transcode the output"
seems to me to be the most likely route to achieve that.

Having said that, I won't be the one writing those backends - if
people like Steve are OK with transcoding (or dealing with pip issues
saying "I can't read the compiler output" being passed back to them as
backend issues) then I'm not going to argue against UTF-8.

Paul