[Python-3000] Unicode and OS strings
"Martin v. Löwis"
martin at v.loewis.de
Fri Sep 28 23:00:29 CEST 2007
> msvcrt ships with the operating system - I'd call that a conforming
Yes, but it's not part of the operating system interface; Microsoft
documents it as "for future use only by system-level components".
> I still regard handling argv as anything other the raw bytes that come
> from the host as bad.
The point is that you cannot use "raw bytes" in Win32, not without
potential loss of data. If you pass arbitrary bytes to os.spawn*,
they get converted to Unicode, and the resulting Unicode command
line gets passed to the child process. So the *native* API is
Unicode, not arbitrary bytes - there is also _wmain supported by
the C library, if you want broken down command line arguments, but
without character set conversions.
> If we're going to call something
> sys.argv, then presumably that was done because there was a
> conventionally accepted meaning to it, and I would argue that meaning
> comes from standard C.
Yes, but also in C, the meaning is "characters", not "bytes". ISO
C 99 18.104.22.168.1p2 specifies they are *strings* passed by the host
environment, and elaborates that if the host environment does
is not capable of supplying mixed-case strings, it should convert
them all into lower case. So the intention clearly is that argv
is text, not bytes.
More information about the Python-3000