[Python-3000] Unicode and OS strings

Jim Jewett jimjjewett at gmail.com
Fri Sep 21 17:01:24 CEST 2007


On 9/21/07, Paul Moore <p.f.moore at gmail.com> wrote:
> On 21/09/2007, Jim Jewett <jimjjewett at gmail.com> wrote:
> > (Outside ASCII), if you treat sys.argv as text, that is probably
> > impossible without filesystem support.  Before python even sees the
> > data, the terminal itself is allowed to change between canonical
> > equivalents, which have different binary representations.

> Please note - this statement is Unix specific. The situation on
> Windows is entirely different (the fact that the CRT on Windows
> emulates some aspects of the Unix semantics is not relevant here - you
> need to understand the underlying OS model).

No; it is a consequence of unicode.  The command shell (or other
program launcher) have the same freedom.

If you are using text (as opposed to bytes), then À can be either
U+00C0 or <U+0041, U+0300>.  If the file system makes a distinction,
then it is using bytes, and any program interacting with it needs* to
use bytes too.

* To be correct; in practice, the problems will occur rarely enough
that most people won't notice.

-jJ


More information about the Python-3000 mailing list