[Python-3000] Unicode and OS strings
Jim Jewett
jimjjewett at gmail.com
Fri Sep 21 17:01:24 CEST 2007
On 9/21/07, Paul Moore <p.f.moore at gmail.com> wrote:
> On 21/09/2007, Jim Jewett <jimjjewett at gmail.com> wrote:
> > (Outside ASCII), if you treat sys.argv as text, that is probably
> > impossible without filesystem support. Before python even sees the
> > data, the terminal itself is allowed to change between canonical
> > equivalents, which have different binary representations.
> Please note - this statement is Unix specific. The situation on
> Windows is entirely different (the fact that the CRT on Windows
> emulates some aspects of the Unix semantics is not relevant here - you
> need to understand the underlying OS model).
No; it is a consequence of unicode. The command shell (or other
program launcher) have the same freedom.
If you are using text (as opposed to bytes), then À can be either
U+00C0 or <U+0041, U+0300>. If the file system makes a distinction,
then it is using bytes, and any program interacting with it needs* to
use bytes too.
* To be correct; in practice, the problems will occur rarely enough
that most people won't notice.
-jJ
More information about the Python-3000
mailing list