From ncoghlan at gmail.com Tue Jan 3 01:00:25 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 3 Jan 2017 16:00:25 +1000 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 Message-ID: Hi folks, I have a new PEP up for a change I originally suggested as a downstream patch for Fedora (now that we ship a C.UTF-8 locale by default): https://www.python.org/dev/peps/pep-0538/ I won't post the whole thing here (since a lot of it is background on the C locale system in general, as well as various things we've tried in the past), and instead will just summarise the specific technical changes I'm proposing: * in Py_Initialize, emit a warning on stderr regarding limited Unicode compatibility if we detect that LC_CTYPE is set to the "C" locale * in Programs/python.c (i.e. the C level main() implementation), set LANG and LC_ALL in the environment to "C.UTF-8" if we detect that the locale is otherwise set to "C" * skip the coercion if PYTHONALLOWCLOCALE is set so developers running in recent system Python versions with this implemented can still debug problems that only show up in older Python 3.x releases, or in embedding applications that still use the C locale * grant a priori permission to redistributors to backport this to older versions (as we'd like to include the change in the Fedora system Python for F26, which will be based on Python 3.6.0) I'm posting it here to ask if anyone sees potential deal-breakers for other non-Fedora-derived distros before I post it to python-dev for review. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 3 03:43:31 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 3 Jan 2017 18:43:31 +1000 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: References:

Message-ID: On 3 January 2017 at 17:51, Felix Yan wrote: > On 01/03/2017 02:00 PM, Nick Coghlan wrote: > > I'm posting it here to ask if anyone sees potential deal-breakers for > > other non-Fedora-derived distros before I post it to python-dev for > review. > > AFAIK the C.UTF-8 locale is still a downstream patch and not accepted in > glibc upstream. Arch has closed the request as wontfix some years ago: > https://bugs.archlinux.org/task/32296 It is, and https://sourceware.org/bugzilla/show_bug.cgi?id=17318 is the upstream issue I filed with glibc for that. However, upstream glibc are in the situation where the 1.5 MiB of UTF-8 data is a relatively big addition for them, while it isn't that big a deal relative to the CPython runtime or a full Linux kernel. > IMHO it would be nice to have an option to disable the usage of C.UTF-8. > Yep, that's part of the PEP - if you set PYTHONALLOWCLOCALE, CPython 3.7 would still complain about it, but it wouldn't try to coerce the locale to something else. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From felixonmars at archlinux.org Tue Jan 3 02:51:15 2017 From: felixonmars at archlinux.org (Felix Yan) Date: Tue, 3 Jan 2017 15:51:15 +0800 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: References: Message-ID: On 01/03/2017 02:00 PM, Nick Coghlan wrote: > I'm posting it here to ask if anyone sees potential deal-breakers for > other non-Fedora-derived distros before I post it to python-dev for review. AFAIK the C.UTF-8 locale is still a downstream patch and not accepted in glibc upstream. Arch has closed the request as wontfix some years ago: https://bugs.archlinux.org/task/32296 IMHO it would be nice to have an option to disable the usage of C.UTF-8. -- Regards, Felix Yan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 870 bytes Desc: OpenPGP digital signature URL: From felixonmars at archlinux.org Tue Jan 3 04:09:28 2017 From: felixonmars at archlinux.org (Felix Yan) Date: Tue, 3 Jan 2017 17:09:28 +0800 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: References:

Message-ID: <48d35236-0a44-4cd6-c76b-c88d389c595a@archlinux.org> On 01/03/2017 04:43 PM, Nick Coghlan wrote: > On 3 January 2017 at 17:51, Felix Yan IMHO it would be nice to have an option to disable the usage of C.UTF-8. > > > Yep, that's part of the PEP - if you set PYTHONALLOWCLOCALE, CPython 3.7 > would still complain about it, but it wouldn't try to coerce the locale > to something else. Since we know that our glibc is not providing C.UTF-8, it would be better to auto-detect the availability of that locale, or simply make it a configure switch, instead of having to set PYTHONALLOWCLOCALE for every python process. -- Regards, Felix Yan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 870 bytes Desc: OpenPGP digital signature URL: From ncoghlan at gmail.com Tue Jan 3 04:43:29 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 3 Jan 2017 19:43:29 +1000 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: <48d35236-0a44-4cd6-c76b-c88d389c595a@archlinux.org> References:

<48d35236-0a44-4cd6-c76b-c88d389c595a@archlinux.org> Message-ID: On 3 January 2017 at 19:09, Felix Yan wrote: > > On 01/03/2017 04:43 PM, Nick Coghlan wrote: > > On 3 January 2017 at 17:51, Felix Yan > IMHO it would be nice to have an option to disable the usage of C.UTF-8. > > > > > > Yep, that's part of the PEP - if you set PYTHONALLOWCLOCALE, CPython 3.7 > > would still complain about it, but it wouldn't try to coerce the locale > > to something else. > > Since we know that our glibc is not providing C.UTF-8, it would be > better to auto-detect the availability of that locale, or simply make it > a configure switch, instead of having to set PYTHONALLOWCLOCALE for > every python process. Fedora's glibc still doesn't provide it natively either. Instead, it's provided as an on disk locale, and hence can be deleted if you really want to do so: https://bugzilla.redhat.com/show_bug.cgi?id=902094#c18 If it isn't there, CPython 3.7 will still fall back to the C locale, same as it does for any other missing locale, which will give the following warning pair on stderr: =========== Python detected LC_CTYPE=C, forcing LC_ALL & LANG to C.UTF-8 (set PYTHONALLOWCLOCALE to disable this locale coercion behaviour). Py_Initialize detected LC_CTYPE=C, which limits Unicode compatibility. Some libraries and operating system interfaces may not work correctly. Set `PYTHONALLOWCLOCALE=1 LC_CTYPE=C` to configure a similar environment when running Python directly. =========== (We could potentially give a custom warning in that case by checking the value of the environment variables, but it would be a bit tricky to write a test that could be exercised on platforms that *do* provide C.UTF-8) Distros that want to ship Python 3.7, but don't want their users to see that warning would then need to do one of three things: - ship a C.UTF-8 locale on disk, as initially Debian and now also Fedora derived distros do - contribute a fix to glibc that implements a C.UTF-8 locale *without* the 1.5 MiB of support data (see https://bugzilla.redhat.com/show_bug.cgi?id=902094#c14 ) - patch their system Python to remove the warning Option 2 is the ideal long term result, but 99.999% of Linux users aren't going to be able to tell the difference between that and distros collectively opting for Option 1. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Tue Jan 3 10:56:16 2017 From: barry at python.org (Barry Warsaw) Date: Tue, 3 Jan 2017 10:56:16 -0500 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: References: Message-ID: <20170103105616.51dd26e0@subdivisions.wooz.org> Hi Nick, thanks for writing up this PEP. I'm generally in favor of it, although as you point out, it will probably have less direct effect on Debian and Ubuntu, since we already have the C.UTF-8 locale. I don't have the time right now to test whether that still holds for various container and remote environments, but I'll try to do some research there (and would actually be surprised if that doesn't hold through the entire food chain). A question and a suggestion. On Jan 03, 2017, at 04:00 PM, Nick Coghlan wrote: >* in Py_Initialize, emit a warning on stderr regarding limited Unicode >compatibility if we detect that LC_CTYPE is set to the "C" locale So just to be clear, you propose only to check for exactly the "C" locale? For example, my default locale is en_US.UTF-8 which would not trigger the warning. I wouldn't want it to warn on any .UTF-8 locale since those should be fine too. (I.e. it's just C locale's implicit ASCII that's the problem.) >* in Programs/python.c (i.e. the C level main() implementation), set LANG >and LC_ALL in the environment to "C.UTF-8" if we detect that the locale is >otherwise set to "C" >* skip the coercion if PYTHONALLOWCLOCALE is set so developers running in >recent system Python versions with this implemented can still debug >problems that only show up in older Python 3.x releases, or in embedding >applications that still use the C locale I have nits to pick about the envar name and warning text. I understand the desire to have a positive setting affect this, but it feels more like PYTHONCOERCECLOCALE=0 would be a more descriptive name and setting. That could be problematic because it doesn't allow any value; i.e. PYTHONCOERCECLOCALE=1 wouldn't make sense to disable locale coercion. I think my unease about the name stems from potential misunderstandings about C vs. C.UTF-8, but maybe I'm just worried about a non-problem. Consider this a challenge for a better envar name... or a bikeshed to ignore. :) On to the warnings: When Py_Initialize is called and CPython detects that the configured locale is the default C locale, the following warning will be issued: Py_Initialize detected LC_CTYPE=C, which limits Unicode compatibility. Some libraries and operating system interfaces may not work correctly. Set `PYTHONALLOWCLOCALE=1 LC_CTYPE=C` to configure a similar environment when running Python directly. I find this confusing on several fronts. I think it might be better to say "Embedded Python" rather than "Py_Initialize" since end users who are using an application with Python embedded probably will have no idea what "Py_Initialize" is, and they are the ones who will see this warning first. It also feels odd to provide instructions on how to reproduce this in `python` cli from the embedded warning. It also doesn't say that the locale is being coerced. What about: Embedded Python detected LC_CTYPE=C (a locale with default ASCII encoding), which may cause Unicode compatibility problems. Coercing the locale to C.UTF-8. Set the environment variable PYTHONALLOWCLOCALE=1 to prevent this coercion. If C.UTF-8 isn't available, then the warning would read: Embedded Python detected LC_CTYPE=C (a locale with default ASCII encoding), which may cause some Unicode compatibility problems. Coercion to C.UTF-8 locale is not possible. Set the environment variable PYTHONALLOWCLOCALE=1 to suppress this warning. I'd use the same text for `python` as cli except s/Embedded Python/Python/ I also think there should be a compile-time or run-time flag that embedders could set so that they could explicitly disable the warning or coercion. Something like ASCIILOCALEISFINEANDYESIKNOWWHATIAMDOINGSOSTFU=1 >* grant a priori permission to redistributors to backport this to older >versions (as we'd like to include the change in the Fedora system Python >for F26, which will be based on Python 3.6.0) I think that's fine, but I doubt we'll need it for Debian and derivatives. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From donald at stufft.io Tue Jan 3 11:16:02 2017 From: donald at stufft.io (Donald Stufft) Date: Tue, 3 Jan 2017 11:16:02 -0500 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: <20170103105616.51dd26e0@subdivisions.wooz.org> References: <20170103105616.51dd26e0@subdivisions.wooz.org> Message-ID: > On Jan 3, 2017, at 10:56 AM, Barry Warsaw wrote: > > I find this confusing on several fronts. I think it might be better to say > "Embedded Python" rather than "Py_Initialize" since end users who are using an > application with Python embedded probably will have no idea what > "Py_Initialize" is, and they are the ones who will see this warning first. Embedded Python is wrong since this warning is going to get hit by non-embedded cases too, for instance the default ubuntu docker container where I regularly have to set LC_TYPE=C.UTF-8 when using click. ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Tue Jan 3 11:59:50 2017 From: barry at python.org (Barry Warsaw) Date: Tue, 3 Jan 2017 11:59:50 -0500 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: References: <20170103105616.51dd26e0@subdivisions.wooz.org> Message-ID: <20170103115950.2b16a29a@subdivisions.wooz.org> On Jan 03, 2017, at 11:16 AM, Donald Stufft wrote: >Embedded Python is wrong since this warning is going to get hit by >non-embedded cases too, for instance the default ubuntu docker container >where I regularly have to set LC_TYPE=C.UTF-8 when using click. I think Nick's original intent was to have two different warnings, one when `python` cli was used and another when it is not. I'm just trying to clarify the message in those two cases, but it would be fine with me if it were the same message for both, e.g. just "Python" in both cases. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From donald at stufft.io Tue Jan 3 12:04:52 2017 From: donald at stufft.io (Donald Stufft) Date: Tue, 3 Jan 2017 12:04:52 -0500 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: <20170103115950.2b16a29a@subdivisions.wooz.org> References: <20170103105616.51dd26e0@subdivisions.wooz.org> <20170103115950.2b16a29a@subdivisions.wooz.org> Message-ID: <087C6953-1A43-4B29-9BF2-3C8F73B95599@stufft.io> > On Jan 3, 2017, at 11:59 AM, Barry Warsaw wrote: > > On Jan 03, 2017, at 11:16 AM, Donald Stufft wrote: > >> Embedded Python is wrong since this warning is going to get hit by >> non-embedded cases too, for instance the default ubuntu docker container >> where I regularly have to set LC_TYPE=C.UTF-8 when using click. > > I think Nick's original intent was to have two different warnings, one when > `python` cli was used and another when it is not. I'm just trying to clarify > the message in those two cases, but it would be fine with me if it were the > same message for both, e.g. just "Python" in both cases. > Ah, my bad then! ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Jan 4 01:31:32 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 4 Jan 2017 16:31:32 +1000 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: <20170103105616.51dd26e0@subdivisions.wooz.org> References: <20170103105616.51dd26e0@subdivisions.wooz.org> Message-ID: Note: I've started collating the feedback from the thread at https://github.com/python/peps/issues/171 On 4 January 2017 at 01:56, Barry Warsaw wrote: > A question and a suggestion. > > On Jan 03, 2017, at 04:00 PM, Nick Coghlan wrote: > >>* in Py_Initialize, emit a warning on stderr regarding limited Unicode >>compatibility if we detect that LC_CTYPE is set to the "C" locale > > So just to be clear, you propose only to check for exactly the "C" locale? > For example, my default locale is en_US.UTF-8 which would not trigger the > warning. I wouldn't want it to warn on any .UTF-8 locale since those should > be fine too. (I.e. it's just C locale's implicit ASCII that's the problem.) It's explicitly checking for whether or not the result of "setlocale(LC_CTYPE, NULL)" is the exact string "C", as that's what you get in the cases of interest (i.e. no locale configured, or the configured locale doesn't exist on the current system) >>* in Programs/python.c (i.e. the C level main() implementation), set LANG >>and LC_ALL in the environment to "C.UTF-8" if we detect that the locale is >>otherwise set to "C" >>* skip the coercion if PYTHONALLOWCLOCALE is set so developers running in >>recent system Python versions with this implemented can still debug >>problems that only show up in older Python 3.x releases, or in embedding >>applications that still use the C locale > > I have nits to pick about the envar name and warning text. > > I understand the desire to have a positive setting affect this but it feels > more like PYTHONCOERCECLOCALE=0 would be a more descriptive name and setting. That could be done (checking for the exact string "0", the same way we do for PYTHONHTTPSVERIFY in PEP 493). > That could be problematic because it doesn't allow any value; > i.e. PYTHONCOERCECLOCALE=1 wouldn't make sense to disable locale coercion. I > think my unease about the name stems from potential misunderstandings about C > vs. C.UTF-8, but maybe I'm just worried about a non-problem. Consider this a > challenge for a better envar name... or a bikeshed to ignore. :) It's a fair concern, as I believe the C and C.UTF-8 locales are the same aside from the default text encoding. The proposal is essentially to coerce C.ASCII to C.UTF-8 as we've collectively found the former to be nigh-unusable in practice. The more I think about it, the more I like the suggested change, as it means the verb used in the environment variable ("coerce") matches the one in the warning ("coercing"), rather than relying on folks realising that "allow" is the opposite of "coerce" in this context. > On to the warnings: > > When Py_Initialize is called and CPython detects that the configured > locale is the default C locale, the following warning will be issued: > > Py_Initialize detected LC_CTYPE=C, which limits Unicode > compatibility. Some libraries and operating system interfaces may not work > correctly. Set `PYTHONALLOWCLOCALE=1 LC_CTYPE=C` to configure a similar > environment when running Python directly. > > I find this confusing on several fronts. I think it might be better to say > "Embedded Python" rather than "Py_Initialize" since end users who are using an > application with Python embedded probably will have no idea what > "Py_Initialize" is, and they are the ones who will see this warning first. I avoided the term "embedded", as I think it would be confusing when locale coercion is disabled for the main Python CLI app. > It > also feels odd to provide instructions on how to reproduce this in `python` > cli from the embedded warning. That was a request from some of the Fedora folks, as many of the developers encountering this warning are expected to be software maintenance engineers that will want to reproduce integration issues in a standalone Python runtime. However, I agree it reads strangely, and its arguably redundant given the locale coercion warning when running the main Python CLI app. So I'll drop it from the upstream PEP, and if we decide we really want it for the Fedora system Python, we can tweak the wording in a downstream patch. > It also doesn't say that the locale is being > coerced. The embedded runtime *doesn't* do any locale coercion itself - by the time it runs, it's too late to change the locale, so it just complains without doing anything about it. > What about: > > Embedded Python detected LC_CTYPE=C (a locale with default ASCII > encoding), which may cause Unicode compatibility problems. Coercing the > locale to C.UTF-8. Set the environment variable PYTHONALLOWCLOCALE=1 to > prevent this coercion. Given my above comments, this warning would end up looking something like: Python runtime initialized with LC_CTYPE=C (a locale with default ASCII encoding), which may cause Unicode compatibility problems. Configuring C.UTF-8 as a Unicode-compatible alternative locale is recommended. > If C.UTF-8 isn't available, then the warning would read: > > Embedded Python detected LC_CTYPE=C (a locale with default ASCII > encoding), which may cause some Unicode compatibility problems. Coercion > to C.UTF-8 locale is not possible. Set the environment variable > PYTHONALLOWCLOCALE=1 to suppress this warning. Hmm, I hadn't accounted for the fact that the CLI can actually tell whether or not the coercion to C.UTF-8 worked (as 'setlocale(LC_ALL, "")' will return NULL if the configured locale doesn't exist). That means we can try C.UTF-8 first, and then fall back to en_US.UTF-8 (which would be sufficient to get CentOS and RHEL 5/6/7 working automatically, and likely a lot of other distros as well), before finally giving up and letting the "C" default stand. > I'd use the same text for `python` as cli except s/Embedded Python/Python/ If you missed it, I think I need to better highlight in the PEP that the library does not, and cannot, coerce the locale to C.UTF-8: Py_Initialize runs too late in the startup process for that to work they way we would want it to. The changes needs to incorporate the config flags and preprocessors definitions discussed below should help with that. > I also think there should be a compile-time or run-time flag that embedders > could set so that they could explicitly disable the warning or coercion. > Something like ASCIILOCALEISFINEANDYESIKNOWWHATIAMDOINGSOSTFU=1 Ugh, M4 macros :) But yeah, that's a good idea. Since the runtime initialization warning and the CLI locale coercion are technically independent, what do you think about adding two flags: * --with[out]-c-locale-coercion (setting PY_COERCE_C_LOCALE for the CLI behaviour) * --with[out]-c-locale-warning (setting PY_WARN_ON_C_LOCALE for the runtime initialization behaviour) >>* grant a priori permission to redistributors to backport this to older >>versions (as we'd like to include the change in the Fedora system Python >>for F26, which will be based on Python 3.6.0) > > I think that's fine, but I doubt we'll need it for Debian and derivatives. If more people were in the habit of setting sensible locales in their Docker base images, I doubt I would be bothered about it for Fedora et al either. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From felixonmars at archlinux.org Tue Jan 3 21:56:27 2017 From: felixonmars at archlinux.org (Felix Yan) Date: Wed, 4 Jan 2017 10:56:27 +0800 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: References:

<48d35236-0a44-4cd6-c76b-c88d389c595a@archlinux.org> Message-ID: On 01/03/2017 05:43 PM, Nick Coghlan wrote: > If it isn't there, CPython 3.7 will still fall back to the C locale, > same as it does for any other missing locale I see, that's exactly what I wanted to know. > Distros that want to ship Python 3.7, but don't want their users to > see that warning would then need to do one of three things: I am fine with having the warning around, I just wanted to be sure that it won't just break. Thanks for the detailed info! -- Regards, Felix Yan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 870 bytes Desc: OpenPGP digital signature URL: From barry at python.org Wed Jan 4 09:35:37 2017 From: barry at python.org (Barry Warsaw) Date: Wed, 4 Jan 2017 09:35:37 -0500 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: References: <20170103105616.51dd26e0@subdivisions.wooz.org> Message-ID: <20170104093537.5b5ee0f4@subdivisions.wooz.org> Thanks Nick. I think we're pretty much in agreement on the details, and I'm happy to continue fleshing out the PEP on GH. Just a few comments here. On Jan 04, 2017, at 04:31 PM, Nick Coghlan wrote: >I avoided the term "embedded", as I think it would be confusing when >locale coercion is disabled for the main Python CLI app. > >> It >> also feels odd to provide instructions on how to reproduce this in `python` >> cli from the embedded warning. > >That was a request from some of the Fedora folks, as many of the >developers encountering this warning are expected to be software >maintenance engineers that will want to reproduce integration issues >in a standalone Python runtime. > >However, I agree it reads strangely, and its arguably redundant given >the locale coercion warning when running the main Python CLI app. So >I'll drop it from the upstream PEP, and if we decide we really want it >for the Fedora system Python, we can tweak the wording in a downstream >patch. I like the sample text in #171. >If you missed it, I think I need to better highlight in the PEP that >the library does not, and cannot, coerce the locale to C.UTF-8: >Py_Initialize runs too late in the startup process for that to work >they way we would want it to. Yes, I definitely missed this and I think it's important to emphasize that in the PEP. I think it would be worth adding some text/sample code about how embedders can properly initialize their own applications before they call Py_Initialize(). >But yeah, that's a good idea. Since the runtime initialization warning >and the CLI locale coercion are technically independent, what do you >think about adding two flags: > >* --with[out]-c-locale-coercion (setting PY_COERCE_C_LOCALE for the >CLI behaviour) >* --with[out]-c-locale-warning (setting PY_WARN_ON_C_LOCALE for the >runtime initialization behaviour) This work for me *if* there's really clear documentation about the scope of the effects of these flags, i.e. just as you say above: coercion can only happen in the `python` cli, while the coercion can't happen (and thus only the warning can be issued) in the library/embedded case. >If more people were in the habit of setting sensible locales in their >Docker base images, I doubt I would be bothered about it for Fedora et >al either. Funnily, I ran into a case yesterday where C.UTF-8 *wasn't* set by default on Debian. We have a suite of tools for building packages -well, several, but in this case it's sbuild/schroot- and inside the isolated build environment, C.ASCII is apparently the default. I believe there are open bugs about trying to change this to C.UTF-8 but in any case I was working on a package that failed to build until I explicitly set the locale to C.UTF-8 in the rules file, because the upstream test suite implicitly used ASCII to decode some bytes. So I think this will be useful even for Debian/Ubuntu although we'll likely see lots of these warnings in the package build environment. (Unless, and I haven't checked, the standard Python build helpers already set C.UTF-8; the package in question doesn't use the standard Python helpers.) Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From robertc at robertcollins.net Wed Jan 4 13:08:06 2017 From: robertc at robertcollins.net (Robert Collins) Date: Thu, 5 Jan 2017 07:08:06 +1300 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: References: Message-ID: One thing: is the warning on stderr really needed? It's pretty poor form for the VM to communicate with the user except in extraordinary circumstances like exception handler of last resort. We have verbose mode, and warning on that would make sense to me. Rob On 3 Jan 2017 7:00 PM, "Nick Coghlan" wrote: > Hi folks, > > I have a new PEP up for a change I originally suggested as a downstream > patch for Fedora (now that we ship a C.UTF-8 locale by default): > https://www.python.org/dev/peps/pep-0538/ > > I won't post the whole thing here (since a lot of it is background on the > C locale system in general, as well as various things we've tried in the > past), and instead will just summarise the specific technical changes I'm > proposing: > > * in Py_Initialize, emit a warning on stderr regarding limited Unicode > compatibility if we detect that LC_CTYPE is set to the "C" locale > * in Programs/python.c (i.e. the C level main() implementation), set LANG > and LC_ALL in the environment to "C.UTF-8" if we detect that the locale is > otherwise set to "C" > * skip the coercion if PYTHONALLOWCLOCALE is set so developers running in > recent system Python versions with this implemented can still debug > problems that only show up in older Python 3.x releases, or in embedding > applications that still use the C locale > * grant a priori permission to redistributors to backport this to older > versions (as we'd like to include the change in the Fedora system Python > for F26, which will be based on Python 3.6.0) > > I'm posting it here to ask if anyone sees potential deal-breakers for > other non-Fedora-derived distros before I post it to python-dev for review. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > _______________________________________________ > Linux-sig mailing list > Linux-sig at python.org > https://mail.python.org/mailman/listinfo/linux-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Jan 4 21:09:31 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 5 Jan 2017 12:09:31 +1000 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: References: Message-ID: On 5 January 2017 at 04:08, Robert Collins wrote: > One thing: is the warning on stderr really needed? It's pretty poor form for > the VM to communicate with the user except in extraordinary circumstances > like exception handler of last resort. I think it's appropriate in this case, as the major problem here is that the assumption of ASCII as the preferred text encoding in the "C" locale is an antiquated Anglo-centric default in *C* that has been superceded by UTF-8 or UTF-16-LE in newer languages and runtimes. While I'm not planning to hold my breath, I kinda hope we'll see a change over the next several years where C.ASCII and POSIX.ASCII are introduced as aliases for the current C and POSIX locales, with those two names subsequently transitioning to become aliases for C.UTF-8 (with the latter being reported as the canonical locale name so fallbacks like the one proposed in PEP 538 don't trigger). So what this PEP is doing is taking CPython and saying that we no longer respect the default locale handling behaviour defined by the C language standards, as we think they're wrong (based on our experiences attempting to use them in a multilingual locale dependent application), and so we're overruling them. If people insist on using the default locale anyway, their Python 3 runtime isn't going to work properly, and it isn't a bug in CPython, it's a bug in their environmental configuration. That gives the rationale for the two different warnings: - overruling the default C locale after decades of respecting it (or at least trying to) is a big deal, and hence worth warning about - Python 3's Unicode support is genuinely unreliable in the C locale (everywhere other than Mac OS X), and hence worth warning about > We have verbose mode, and warning on that would make sense to me. Folks that encounter Python 3's deficiencies in the C locale (or hit integration issues arising from the new implicit locale reconfiguration behaviour) without a runtime warning of some kind aren't likely to think "I should run Python in verbose mode to learn more about what's going on", they're more likely to think "Python 3 is broken, I'm going to use something else". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jan 7 03:29:15 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 7 Jan 2017 18:29:15 +1000 Subject: [Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8 In-Reply-To: References: Message-ID: On 3 January 2017 at 16:00, Nick Coghlan wrote: > I'm posting it here to ask if anyone sees potential deal-breakers for other > non-Fedora-derived distros before I post it to python-dev for review. Thanks for all the feedback folks, I've now incorporated it into the PEP here: https://github.com/python/peps/commit/221099d8765125bbd798e869846b005bcca84b47 With the SIG's feedback incorporated, the next round of discussion will take place on python-ideas in parallel with the PEP 540 discussions :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia