On 10.02.2021 23:10, Eryk Sun wrote:
On 2/10/21, M.-A. Lemburg email@example.com wrote:
setx PYTHONUTF8 1
does the trick in an admin command shell on Windows globally.
The above command sets the variable only for the current user, which I'd recommend anyway. It does not require administrator access. To set a machine value, run `setx /M PYTHONUTF8 1`, which of course requires administrator access. Also, run `set PYTHONUTF8=1` in CMD or `$env:PYTHONUTF8=1` in PowerShell to set the variable in the current shell.
Thanks for the correction.
Unrelated to UTF-8 mode and long-term plans to make UTF-8 the preferred encoding, what I want, from the perspective of writing applications and scripts (not libraries), is a -X option and/or environment variable to make local._get_locale_encoding() behave like it does in POSIX. It should return the LC_CTYPE codeset of the current locale, not just the default locale.
That's what getlocale(LC_CTYPE) is intended for, unless I'm missing something.
getdefaultlocale(), which uses _locale._getdefaultlocale() on Windows, is meant to determine the locale settings, setlocale(locale.LC_ALL, '') would be setting for the current process, without actually doing this.
The reason we have this API is because setlocale() is not thread-safe and could therefore cause problems in other threads when simply trying to call setlocale(locale.LC_ALL, '') and then reset this again if needed.
This would allow setlocale() in Windows to change the default for encoding=None, just as it does in POSIX. Technically it's not hard to implement in a way that's as reliable as nl_langinfo(CODESET) in POSIX. The code page of the current CRT locale is a public field. In Windows 10 the CRT has supported UTF-8 for 3 years -- regardless of the process active code page returned by GetACP(). Just call setlocale(LC_CTYPE, ".UTF-8") or setlocale(LC_CTYPE, (getdefaultlocale(), 'UTF-8')).
I think the main problem here is that open() doesn't use locale.getlocale() as default for the encoding parameter, but instead locale.getpreferredencoding(False).
The latter doesn't change when you adjust the locale for the current process on Windows:
import locale locale.getdefaultlocale()
locale.setlocale(locale.LC_CTYPE, ('de_DE', 'UTF-8'))
f = open(r'some-file.txt') f.encoding
On Linux, locale.getpreferredencoding(False) does return changes made using setlocale().