[New-bugs-announce] [issue24968] Python 3 raises Unicode errors with the xxx.UTF-8 locale
Roberto Sánchez
report at bugs.python.org
Mon Aug 31 10:42:12 CEST 2015
New submission from Roberto Sánchez:
System: Python 3.4.2 on Linux Fedora 22
This issues is strongly related with: http://bugs.python.org/issue19846 But It isn't exactly the same case.
When I connect from my Mac OSX (using Terminal.app) to a Linux host with Fedora through ssh, the terminal session is forced to the OSX locale (default behavior in Terminal.app):
[rob at fedora22 ~]$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=
However the installed locales in Fedora are:
[rob at fedora22 ~]$ localectl list-locales
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8 <-- This is the default one
And if a launch python3 I get:
[rob at fedora22 ~]$ python3
Python 3.4.2 (default, Jul 9 2015, 17:24:30)
[GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, codecs, sys, locale
>>> locale.getpreferredencoding()
'ANSI_X3.4-1968'
>>> codecs.lookup(locale.getpreferredencoding()).name
'ascii'
>>> locale.getdefaultlocale()
('es_ES', 'UTF-8')
>>> sys.stdout.encoding
'ANSI_X3.4-1968'
>>> sys.getfilesystemencoding()
'ascii'
>>> print('España')
File "<stdin>", line 0
^
SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 11: ordinal not in range(128)
So, If I'm understanding correctly, If the current locale is not supported by the system then python fallback to ascii.
I can understand this behavior when the supported locales and the current one has different encoding, but if both of them are 'utf-8' It sounds reasonable that locale.getpreferredencoding() is set to 'utf-8'.
This case is causing that programs with CLI (Command Line Interface) fails, if you are using a third party like click lib, a RuntimeException is thrown by the own lib, I learned it by the hard way, the python3 CLI programs need a valid encoding to deal with stdin/stdout, and in this case all systems seems correctly configured about the encoding, I mean, this is a real case, there is no manual locale config modification, IMHO the current behavior seems a bit strict.
----------
components: Unicode
messages: 249390
nosy: ezio.melotti, haypo, rsc1975
priority: normal
severity: normal
status: open
title: Python 3 raises Unicode errors with the xxx.UTF-8 locale
type: behavior
versions: Python 3.4
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24968>
_______________________________________
More information about the New-bugs-announce
mailing list