[New-bugs-announce] [issue12451] open: avoid the locale encoding when possible

STINNER Victor report at bugs.python.org
Thu Jun 30 14:20:51 CEST 2011

New submission from STINNER Victor <victor.stinner at haypocalc.com>:

open() uses the locale encoding in Python 3 when opening text file if the encoding argument is not specified (implicit). Some functions use locale encoding, but it's not the right encoding. I see at least three cases where the encoding should be changed:

 - UTF-8 should be used instead for portability: it's a bug in the module
 - ASCII must be used instead: the module doesn't support non-ASCII characters (old file formats, old network protocols, some fields of a document, etc.)
 - ASCII can be used instead: it's just a micro-optimization, the ASCII encoding is  a little bit faster

To detect the usage of the implicit locale encoding, some functions can be monkeypatched:

 - builtins.open, io.open, _pyio.open
 - io.TextIOWrapper, _pyio.TextIOWrapper
 - more functions using directly or indirectly open/TextIOWrapper may be patched to emit the warning earlier

Attached open_hook.patch implements these hooks (hacks?) in the site module: it emits a ResourceWarning. Use python -Werror to raise an error if the locale encoding is used implicitly. If you really want to use the locale encoding, use encoding='locale' to make quiet the warning.

Quite all functions in Python uses the implicit locale encoding. For example, Python doesn't start with the patch and -Werror. If you use -Werror, you have to patch *all* calls to open()/TextIOWrapper to be able to locate real bugs, or the program will stop before hitting the real problems. Each time you have to check what is the real expected encoding, it takes a lot of time.

I started this huge project. I'm using ASCII most of the time (especially in Python tests), I don't know if it's correct. It will require a second step to ensure that the function really don't use/support non-ASCII characters.

I will use this issue for my commits, attach patches, and more generally discuss this topic.

components: Unicode
files: open_hook.patch
keywords: patch
messages: 139473
nosy: haypo
priority: normal
severity: normal
status: open
title: open: avoid the locale encoding when possible
versions: Python 3.3
Added file: http://bugs.python.org/file22520/open_hook.patch

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list