[New-bugs-announce] [issue2128] sys.argv is wrong for unicode strings

Giovanni Bajo report at bugs.python.org
Sat Feb 16 17:27:46 CET 2008

New submission from Giovanni Bajo:

Under Windows, sys.argv is created through the Windows ANSI API.

When you have a file/directory which can't be represented in the 
system encoding (eg: a japanese-named file or directory on a Western 
Windows), Windows will encode the filename to the system encoding using
what we call the "replace" policy, and thus sys.argv[] will contain an
entry like "c:\\foo\\??????????????.dat".

My suggestion is that:

* At the Python level, we still expose a single sys.argv[], which will 
contain unicode strings. I think this exactly matches what Py3k does now. 

* At the C level, I believe it involves using GetCommandLineW() and 
CommandLineToArgvW() in WinMain.c, but should Py_Main/PySys_SetArgv() be 
changed to also accept wchar_t** arguments? Or is it better to allow for 
NULL to be passed (under Windows at least), so that the Windows
code-path in there can use GetCommandLineW()/CommandLineToArgvW() to get
the current process' arguments?

components: Interpreter Core
messages: 62458
nosy: giovannibajo
severity: normal
status: open
title: sys.argv is wrong for unicode strings
type: behavior
versions: Python 3.0

Tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list