[Python-3000] Unicode and OS strings
Jim Jewett
jimjjewett at gmail.com
Sat Sep 22 21:11:34 CEST 2007
On 9/22/07, martin at v.loewis.de <martin at v.loewis.de> wrote:
> Zitat von Jim Jewett <jimjjewett at gmail.com>:
>
> > On 9/21/07, Paul Moore <p.f.moore at gmail.com> wrote:
> >> On 21/09/2007, Jim Jewett <jimjjewett at gmail.com> wrote:
[The original context, expressed with some detail by Michael Urman in
http://mail.python.org/pipermail/python-3000/2007-September/010621.html
was that it must be possible to treat command line arguments as filenames.]
> >> > (Outside ASCII), if you treat sys.argv as text, that is probably
> >> > impossible without filesystem support. Before python even sees the
> >> > data, the terminal itself is allowed to change between canonical
> >> > equivalents, which have different binary representations.
> > No; it is a consequence of unicode. The command shell (or other
> > program launcher) have the same freedom.
> I'm not quite sure what you are talking about here (what "same"
> freedom?),
The same freedom to represent À as either U+00C0 or <U+0041, U+0300>
> argc/argv does not exist on Windows (that you seem to see it
> anyway is an illusion), and if it did exist, it would be characters,
> not bytes. "Canonical equivalents" is not a property of bytes,
> but of Unicode characters (code points specifically).
> Also, I'm not quite sure why you think the file system has
> to do anything with sys.argv (unless your understanding of
> what a "filesystem" is differs from mine).
The filesystem is unrelated to sys.argv, except for the need to pass
filenames through argv. If the filesystem is using bytes rather than
characters, then sys.argv must offer the same option, or else certain
scripts will (under some rare circumstances) fail.
-jJ
More information about the Python-3000
mailing list