[Python-3000] Unicode and OS strings

"Martin v. Löwis" martin at v.loewis.de
Fri Sep 28 23:00:29 CEST 2007


> msvcrt ships with the operating system - I'd call that a conforming
> implementation.

Yes, but it's not part of the operating system interface; Microsoft
documents it as "for future use only by system-level components".

> I still regard handling argv as anything other the raw bytes that come
> from the host as bad.

The point is that you cannot use "raw bytes" in Win32, not without
potential loss of data. If you pass arbitrary bytes to os.spawn*,
they get converted to Unicode, and the resulting Unicode command
line gets passed to the child process. So the *native* API is
Unicode, not arbitrary bytes - there is also _wmain supported by
the C library, if you want broken down command line arguments, but
without character set conversions.

> If we're going to call something
> sys.argv, then presumably that was done because there was a
> conventionally accepted meaning to it, and I would argue that meaning
> comes from standard C.

Yes, but also in C, the meaning is "characters", not "bytes". ISO
C 99 5.1.2.2.1p2 specifies they are *strings* passed by the host
environment, and elaborates that if the host environment does
is not capable of supplying mixed-case strings, it should convert
them all into lower case. So the intention clearly is that argv[]
is text, not bytes.

Regards,
Martin



More information about the Python-3000 mailing list