[Python-ideas] PEP 540: Add a new UTF-8 mode

Wed Jan 11 06:27:43 EST 2017

On Wed, Jan 11, 2017 at 7:46 PM, Stephan Houben <stephanh42 at gmail.com> wrote:
> Hi INADA Naoki,
>
> (Sorry, I am unsure if INADA or Naoki is your first name...)

Never mind, I don't care about name ordering. (INADA is family name).

>
> While I am very much in favour of everything working "out of the box",
> an issue is that we don't have control over external code
> (be it Python extensions or external processes invoked from Python).
>
> And that code will only look at LANG/LC_TYPE and ignore any cleverness
> we build into Python.
>

I'm sorry, could you give me more concrete example?

My opinion is +1 to PEP 540, there should be an option to ignore locale
setting.  (And I hope it will be default setting in future version.)

What is your concern?

> For example, this may mean that a built-in Python string sort will give you
> a different ordering than invoking the external "sort" command.
> I have been bitten by this kind of issues, leading to spurious "diffs" if
> you try to use sorting to put strings into a canonical order.
>
> So my feeling is that people are ultimately not being helped by
> Python trying to be "nice", since they will be bitten by locale issues
> anyway. IMHO ultimately better to educate them to configure the locale.
> (I realise that people may reasonably disagree with this assessment ;-) )
>
> I would then recommend to set to en_US.UTF-8, which is slower and
> less elegant but at least more widely supported.

But someone can't accept 30x slower only sorting ASCII text.
At least, infrastructure engineer in my company loves C locale.

New Python programmer (e.g. there are many data scientists learning Python)
may want to work on Linux server, and learning about locale is not their
concern.
Web programmers are same.  Just want to print UTF-8.
Learning about locale may not worth enough for them.
But I think there should be an option, and I want to use it.

>
> By the way, I know a bit how Node.js deals with locales, and it doesn't try
> to compensate for "C" locales either. But what it *does* do is that
> Node never uses the locale settings to determine the encoding of a file:
> you either have to specify it explicitly OR it defaults to UTF-8 (the latter
> on output only).
> So in this respect it is by specification immune against misconfiguration of
> the encoding.
> However, other stuff (e.g. date formatting) will still be influenced by the
> "C" locale
> as usual.
>
>
> Stephan
>

Yes.  Both of PEP 538 and 540 is about encoding.
I'm sorry about my misleading word "locale-free".

There should be locale support for time formatting, at least UTF-8 locale.

Regards,