[Python-3000] Unicode and OS strings

Stephen J. Turnbull stephen at xemacs.org
Wed Sep 19 07:00:51 CEST 2007


James Y Knight writes:

 > iso-2022 or some other abomination. This has upsides (simple, doesn't  
 > trample on PUA codepoints, only needs one new codec, never throws  
 > exception in the above example, and really is correct much of the  
 > time), and downsides (if the system locale is iso-2022, and all the  
 > filenames you're dealing with really are also properly encoded in  
 > iso-2022, it might be nice if they decoded into the sensible unicode  
 > string, instead of a non-sensical (but still round-trippable) one.

ISO 2022, like Unicode, is an extensible standard.  Corporate
character sets in Asia extend, but are not easy to distinguish from
each other though they often conflict.  They're not proper in the
sense that they abuse the registered final bytes of the national
standards they're based on, but it's also not reasonable for those of
us who live there to ignore them.

 > I think the advantages outweigh the disadvantages, but the world I  
 > live in, using anything other than UTF8 or ASCII is grounds for entry  
 > into an insane asylum. ;)

You're very fortunate.  In the world I live in, Shift JIS, which isn't
even ISO 2022 compatible, is mandated by a power higher even than the
Borg of Redmond: the telephone company.



More information about the Python-3000 mailing list