[Python-Dev] unicode/string asymmetries

Mark Hammond mhammond@skippinet.com.au
Fri, 11 Jan 2002 10:11:56 +1100


> > windows "ansi" is an alias for the encoding you get from
> >
> >     import locale
> >     language, encoding = locale.getdefaultlocale()
> >
> > for people in western europe/north america
>
> Isn't that also known as "mbcs" in Python? And it is different from
> "oem", which is not exposed to Python, right?

<gulp> My turn to speak of which I do not really understand :)

mbcs is an "encoding", but a strange encoding in that it depends on the
character set.  The character set itself determines what bytes are lead
bytes.

Thus, the same mbcs string may be interpreted differently depending on the
current character set/code page.  Thus "ansi" and "oem" are code pages,
where mbcs is an encoding.

This is why Neil demonstrated problems referencing (say) a Japenese filename
when the current code-page is not Japanese - there is only a valid mbcs
representation in supported code pages.

Mark.