[New-bugs-announce] [issue9561] distutils: set encoding to utf-8 for input and output files

STINNER Victor report at bugs.python.org
Tue Aug 10 19:51:39 CEST 2010

New submission from STINNER Victor <victor.stinner at haypocalc.com>:

While working on #9425 (support non-ascii characters in python directory name with ascii locale), I wrote a patch for distutils.file_util(): set encoding to utf-8 and errors to surrogateescape. See the patch with comments at:

(the patch is not enough, it should also patch *all* functions reading files)

I discussed with takek who told me that it is documented that distutils files have to be utf-8. I didn't found the documentation. I checked read_manifest() in sdist command: in Python2 and Python3, it uses open(name) syntax. It means that Python2 uses the binary API (bytes), whereas Python3 uses the text API (unicode characters) and Python3 relies on open() (TextIOWrapper) heuristic to *guess* the file encoding.

I think that it will be better to specify the encoding in Python3, and maybe use the text API in Python2.

Anyway, before going futher (work on patches), I would like the approval of distutils maintainer(s).

assignee: tarek
components: Distutils, Distutils2, Unicode
messages: 113552
nosy: haypo, merwok, tarek
priority: normal
severity: normal
status: open
title: distutils: set encoding to utf-8 for input and output files
versions: Python 3.2

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list