[Python-3000] Windows, sys.argv and unicode

Giovanni Bajo rasky at develer.com
Sat Feb 16 16:20:28 CET 2008


Hello,

CPython 2.x (and 3.x) under Win32 has an issue with sys.argv. The list is 
computed using the ANSI version of the windows APIs[*]. The problem is 
apparent when you have a file/directory which can't be represented in the 
system encoding (eg: a japanese-named file or directory on a Western 
Windows), because the Windows ANSI API will encode the filename to the 
system encoding using what we call the "replace" policy, and sys.argv[] 
will contain an entry like "c:\\foo\\??????????????.dat".

At the moment, there's simply no way of passing such a file to a Python 
script/application as an argument (eg: if you double-click on that file, 
and the file is associated to a Python application). This is a wide-
spread problem among Python applications; eg. if you click on a 
Japanese .torrent file, ABC (a Bittorent client written in Python) won't 
be able to open it and will complain "cannot access 
file ??????????.torrent".

I understand that fixing this properly in the 2.x serie might have 
backward compatibility issues, but I propose that this be fixed at least 
in the Python 3.x serie, and I volunteer to write a patch. I would be 
glad if someone expert with ANSI/Unicode/Windows (MvL?) would show me 
what he believes being the correct way of approaching this problem. 

My suggestion is that:

* At the Python level, we still expose a single sys.argv[], which will 
contain unicode strings. I think this exactly matches what Py3k does now. 
(Back in the time, there were proposals to add a sys.argvu, but I guess 
it does not make sense right now).
* At the C level, I believe it involves using GetCommandLineW() and 
CommandLineToArgvW() in WinMain.c, but should Py_Main/PySys_SetArgv() be 
changed to also accept wchar_t** arguments? Or is it better to allow for 
NULL to be passed (under Windows at least), so that the Windows code-path 
in there can use GetCommandLineW()/CommandLineToArgvW() to get the 
current process' arguments?

Thanks!

[*] In detail: it actually comes from __argc/__argv (see WinMain.c), 
which in turn are computed by the CRT startup code, which would adapt to 
user's choice but Python is being compiled in ANSI mode.
-- 
Giovanni Bajo



More information about the Python-3000 mailing list