[Python-3000] Windows, sys.argv and unicode

Guido van Rossum guido at python.org
Sat Feb 16 16:49:22 CET 2008


Thanks for reminding us of this!

Is there any chance that you can submit a patch (or even the
beginnings of one) for this?

At the very least, can you add the issue to the tracker so we'll have
a permanent reminder until it's resolved?

On Feb 16, 2008 7:20 AM, Giovanni Bajo <rasky at develer.com> wrote:
> Hello,
>
> CPython 2.x (and 3.x) under Win32 has an issue with sys.argv. The list is
> computed using the ANSI version of the windows APIs[*]. The problem is
> apparent when you have a file/directory which can't be represented in the
> system encoding (eg: a japanese-named file or directory on a Western
> Windows), because the Windows ANSI API will encode the filename to the
> system encoding using what we call the "replace" policy, and sys.argv[]
> will contain an entry like "c:\\foo\\??????????????.dat".
>
> At the moment, there's simply no way of passing such a file to a Python
> script/application as an argument (eg: if you double-click on that file,
> and the file is associated to a Python application). This is a wide-
> spread problem among Python applications; eg. if you click on a
> Japanese .torrent file, ABC (a Bittorent client written in Python) won't
> be able to open it and will complain "cannot access
> file ??????????.torrent".
>
> I understand that fixing this properly in the 2.x serie might have
> backward compatibility issues, but I propose that this be fixed at least
> in the Python 3.x serie, and I volunteer to write a patch. I would be
> glad if someone expert with ANSI/Unicode/Windows (MvL?) would show me
> what he believes being the correct way of approaching this problem.
>
> My suggestion is that:
>
> * At the Python level, we still expose a single sys.argv[], which will
> contain unicode strings. I think this exactly matches what Py3k does now.
> (Back in the time, there were proposals to add a sys.argvu, but I guess
> it does not make sense right now).
> * At the C level, I believe it involves using GetCommandLineW() and
> CommandLineToArgvW() in WinMain.c, but should Py_Main/PySys_SetArgv() be
> changed to also accept wchar_t** arguments? Or is it better to allow for
> NULL to be passed (under Windows at least), so that the Windows code-path
> in there can use GetCommandLineW()/CommandLineToArgvW() to get the
> current process' arguments?
>
> Thanks!
>
> [*] In detail: it actually comes from __argc/__argv (see WinMain.c),
> which in turn are computed by the CRT startup code, which would adapt to
> user's choice but Python is being compiled in ANSI mode.
> --
> Giovanni Bajo
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list