[New-bugs-announce] [issue24968] Python 3 raises Unicode errors with the xxx.UTF-8 locale

Roberto Sánchez report at bugs.python.org
Mon Aug 31 10:42:12 CEST 2015


New submission from Roberto Sánchez:

System: Python 3.4.2 on Linux Fedora 22

This issues is strongly related with: http://bugs.python.org/issue19846 But It isn't exactly the same case.

When I connect from my Mac OSX (using Terminal.app) to a Linux host with Fedora through ssh, the terminal session is forced to the OSX locale (default behavior in Terminal.app):

    [rob at fedora22 ~]$ locale
    locale: Cannot set LC_CTYPE to default locale: No such file or directory
    locale: Cannot set LC_MESSAGES to default locale: No such file or directory
    locale: Cannot set LC_ALL to default locale: No such file or directory
    LANG=es_ES.UTF-8
    LC_CTYPE="es_ES.UTF-8"
    LC_NUMERIC="es_ES.UTF-8"
    LC_TIME="es_ES.UTF-8"
    LC_COLLATE="es_ES.UTF-8"
    LC_MONETARY="es_ES.UTF-8"
    LC_MESSAGES="es_ES.UTF-8"
    LC_PAPER="es_ES.UTF-8"
    LC_NAME="es_ES.UTF-8"
    LC_ADDRESS="es_ES.UTF-8"
    LC_TELEPHONE="es_ES.UTF-8"
    LC_MEASUREMENT="es_ES.UTF-8"
    LC_IDENTIFICATION="es_ES.UTF-8"
    LC_ALL=

However the installed locales in Fedora are:

    [rob at fedora22 ~]$ localectl list-locales
    en_US
    en_US.iso88591
    en_US.iso885915
    en_US.utf8       <-- This is the default one

And if a launch python3 I get:

    [rob at fedora22 ~]$ python3
    Python 3.4.2 (default, Jul  9 2015, 17:24:30) 
    [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os, codecs, sys, locale
    >>> locale.getpreferredencoding()
    'ANSI_X3.4-1968'
    >>> codecs.lookup(locale.getpreferredencoding()).name
    'ascii'
    >>> locale.getdefaultlocale()
    ('es_ES', 'UTF-8')
    >>> sys.stdout.encoding
    'ANSI_X3.4-1968'
    >>> sys.getfilesystemencoding()
    'ascii'
    >>> print('España')
      File "<stdin>", line 0
        
        ^
    SyntaxError: 'ascii' codec can't decode byte 0xc3 in position 11: ordinal not in range(128)


So, If I'm understanding correctly, If the current locale is not supported by the system then python fallback to ascii.

I can understand this behavior when the supported locales and the current one has different encoding, but if both of them are 'utf-8' It sounds reasonable that locale.getpreferredencoding() is set to 'utf-8'.

This case is causing that programs with CLI (Command Line Interface) fails, if you are using a third party like click lib, a RuntimeException is thrown by the own lib, I learned it by the hard way, the python3 CLI programs need a valid encoding to deal with stdin/stdout, and in this case all systems seems correctly configured about the encoding, I mean, this is a real case, there is no manual locale config modification, IMHO the current behavior seems a bit strict.

----------
components: Unicode
messages: 249390
nosy: ezio.melotti, haypo, rsc1975
priority: normal
severity: normal
status: open
title: Python 3 raises Unicode errors with the xxx.UTF-8 locale
type: behavior
versions: Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24968>
_______________________________________


More information about the New-bugs-announce mailing list