[issue19685] open() fails to autodetect utf-8 if LANG=C

Curtis Doty report at bugs.python.org
Thu Nov 21 22:21:02 CET 2013


New submission from Curtis Doty:

I first stumbled across this bug attempting to install use pip's cool editable mode:

$ pip install -e git+git://github.com/appliedsec/pygeoip.git#egg=pygeoip
Obtaining pygeoip from git+git://github.com/appliedsec/pygeoip.git#egg=pygeoip
  Cloning git://github.com/appliedsec/pygeoip.git to ./src/pygeoip
  Running setup.py egg_info for package pygeoip
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/home/curtis/python/3.3.3/lib/python3.3/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1098: ordinal not in range(128)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/home/curtis/python/3.3.3/lib/python3.3/encodings/ascii.py", line 26, in decode

    return codecs.ascii_decode(input, self.errors)[0]

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1098: ordinal not in range(128)

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /home/curtis/python/2013-11-20/src/pygeoip
Storing complete log in /home/curtis/.pip/pip.log


It turns out this is related to a local LANG=C environment. If I set LANG=en_US.UTF-8, the problem goes away. But it seems pip/python3 open() should be more intelligently handling this.

Worse, the file in this case https://github.com/appliedsec/pygeoip/blob/master/setup.py already has a source code decorator *declaring* it as utf-8.

Ugly workaround patch is to force pip to always use 8-bit encoding on setup.py:

--- pip.orig/req.py	2013-11-19 15:53:49.000000000 -0800
+++ pip/req.py	2013-11-20 16:37:23.642656132 -0800
@@ -281,7 +281,7 @@ def replacement_run(self):
             writer(self, ep.name, os.path.join(self.egg_info,ep.name))
     self.find_sources()
 egg_info.egg_info.run = replacement_run
-exec(compile(open(__file__).read().replace('\\r\\n', '\\n'), __file__, 'exec'))
+exec(compile(open(__file__,encoding='utf_8').read().replace('\\r\\n', '\\n'), __file__, 'exec'))
 """
 
     def egg_info_data(self, filename):
@@ -687,7 +687,7 @@ exec(compile(open(__file__).read().repla
             ## FIXME: should we do --install-headers here too?
             call_subprocess(
                 [sys.executable, '-c',
-                 "import setuptools; __file__=%r; exec(compile(open(__file__).read().replace('\\r\\n', '\\n'), __file__, 'exec'))" % self.setup_py]
+                 "import setuptools; __file__=%r; exec(compile(open(__file__,encoding='utf_8').read().replace('\\r\\n', '\\n'), __file__, 'exec'))" % self.setup_py]
                 + list(global_options) + ['develop', '--no-deps'] + list(install_options),
 
                 cwd=self.source_dir, filter_stdout=self._filter_install,


But that only treats the symptom. Root cause appears to be in python3 as demonstrated by this simple script:

wrong-codec.py:
#! /bin/env python3
from urllib.request import urlretrieve
urlretrieve('https://raw.github.com/appliedsec/pygeoip/master/setup.py', filename='setup.py')

# if LANC=C then locale.py:getpreferredencoding()->'ANSI_X3.4-1968'
foo= open('setup.py')

# bang! ascii_decode() cannot handle the unicode
bar= foo.read()


This does not occur in python2. Is this bug in pip or python3?

----------
components: Unicode
messages: 203673
nosy: GreenKey, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: open() fails to autodetect utf-8 if LANG=C
type: crash
versions: Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19685>
_______________________________________


More information about the Python-bugs-list mailing list