[issue8775] Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.)

Ronald Oussoren report at bugs.python.org
Sat Jul 24 11:14:42 CEST 2010


Ronald Oussoren <ronaldoussoren at mac.com> added the comment:

This issue only seems to be relevant for OSX, and then only for OSX releases before 10.5, because in that release Apple made sure that the LANG variable and simular LC_* ones specify a UTF-8 encoding and we're back at the common case where the filesystem encoding matches the locale encoding.

A system where the filesystem encoding doesn't match the locale encoding is hard to get right. While it would be possible to add sys.cmdlineencoding that doesn't actually solve the semantic problem because external tools might not cooperate.

That is, most system tools seem to work with bytes internally and do not treat arguments as text encoded in the locale encoding that should be re-encoded in the filesystem encoding before passing them to the C APIs.

That is, when calling "ls somefile" the "ls" command will pass the bytes in argv[1] to the POSIX routines for getting file information without trying to reencode.

In short, having a filesystem encoding that is different from the command-line only works when all system tools cooperate and are unicode aware.

To be honest, I'd say the behavior of OSX 10.4 is a bug and we might add a workaround on that platform that uses CFStringGetSystemEncoding() to fetch the actual system encoding when LANG=C.

(And I'm -1 on adding the patch)

See also: issue9167

----------
nosy: +ronaldoussoren

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8775>
_______________________________________


More information about the Python-bugs-list mailing list