![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 21 August 2014 14:52, Cameron Simpson <cs@zip.com.au> wrote:
Oh, and I reject Nick's characterisation of POSIX as "broken". It's perfectly internally consistent. It just doesn't match what he wants. (Indeed, what I want, and I'm a long time UNIX fanboy.)
The part that is broken is the idea that locale encodings are a viable solution to conveying the appropriate encoding to use to talk to the operating system. We've tried trusting them with Python 3, and they're reliably wrong in certain situations. systemd is apparently better than upstart at setting them correctly (e.g. for cron jobs), but even it can't defend against an erroneous (or deliberate!) "LANG=C", or ssh environment forwarding pushing a client's locale to the server. It's worth looking through some of Armin Ronacher's complaints about Python 3 being broken on Linux, and seeing how many of them boil down to "trusting the locale is wrong, Python 3 should just assume UTF-8 on every POSIX system, the same way it does on Mac OS X". (I suspect ShiftJIS, ISO-2022, et al users might object to that approach, but it's at least a more viable choice now than it was back in 2008) I still think we made the right call at least *trying* the idea of trusting the locale encoding (since that's the officially supported way of getting this information from the OS), and in many, many situations it works fine. But I suspect we may eventually need to resolve the technical issues currently preventing us from deciding to ignore the environmental locale during interpreter startup and try something different (such as always assuming UTF-8, or trying to force C.UTF-8 if we detect the C locale, or looking for the systemd config files and using those to set the OS encoding, rather than the environmental locale). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia