[Python-ideas] Force UTF-8 option regardless locale

Nick Coghlan ncoghlan at gmail.com
Mon Aug 29 23:49:21 EDT 2016


On 30 August 2016 at 10:05, INADA Naoki <songofacandy at gmail.com> wrote:
> On Tue, Aug 30, 2016 at 8:14 AM, Victor Stinner
> <victor.stinner at gmail.com> wrote:
>>
>> I proposed the idea, but I'm not sure that we can have a single option
>> for Linux and Windows. Moreover, I never really worked on trying to
>> implement "-X utf8" on Linux, because it looks like the "misconfigured
>> system" are less and less common nowadays. I see very few user
>> requests in this direction.
>
> Some people loves tiny Linux image for Docker and RasberryPi. They
> doesn't has any locale other than C.

We run into this for CentOS images as well - the Docker images
currently still default to C, as they don't have C.UTF-8 available
(although you can set LANG=en_US.UTF-8 in your Dockerfile)

(I think Fedora has started defaulting to C.UTF-8 now, but I haven't
actually checked recently)

> Some OPs loves LANG=C or LC_ALL=C to avoid troubles and unexpected
> performance regression caused by locale.  (e.g. sort command is much
> slower on ja_JP.utf8).

Broad availability of C.UTF-8 will hopefully help mitigate that
behaviour, but there's still a long transition ahead on that front, as
it seems unlikely "LANG=C" will ever be redefined to mean
"LANG=C.UTF-8", so folks have to explicitly request "LANG=C.ASCII" to
get the old US-centric behaviour :(

> I want to write script using utf-8 for stdio and fsencoding.
> Sometimes, people runs my script in C locale. And sometimes runs in
> misconfigured
> locale because SSH sends LANG that system doesn't have.
>
> So I wonder if Python has Force UTF-8" option.
> And if the option is configure option or site-wide installation option, because:
>
> * command line option cannot be set in shebang
> * Setting environment variable may be forgetten when writing scripts
> like crontab.
>
> The option may make startup bit faster, because it can skip setting locale
> in startup.
>
> Any thoughts?
> How should the option be set?

While I agree this is a good way to go, we unfortunately don't have a
lot of precedent to work with here :(

The closest we've had to date to a "CPython runtime configuration
file" is the implementation dependent cert verification config file in
PEP 493: https://www.python.org/dev/peps/pep-0493/#backporting-pep-476-to-earlier-python-versions

Since that was designed specifically as a migration tool for the RHEL
system Python, it glosses over a lot of things we'd need to care about
for a proper config file, like:

- how it works when running from a local checkout
- how (or if) to support parallel installations
- how (or if) to support virtual environments
- how (or if) to support per-user overrides
- how (or if) to support environment variable overrides
- how (or if) to support command line overrides
- how to support Windows
- whether we're defining this as a CPython-only thing, or whether we'd
expect other implementations to support it as well

However, a config file was desirable in the cert verification case for
the same reasons you mention here: so it can be visible system wide,
without requiring changes to environment variables or command
invocations.

We do have a per-venv config file (pyvenv.cfg), but that's currently
an implementation detail of the 'venv' module, rather than a clearly
defined standard format.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list