I thought that others might find this reference interesting. It is Matz (the inventor of Ruby) talking about why he thinks that Unicode is good for what it does but not sufficient in general, along with some hints of what he plans for multinationalization in Ruby. The translation is rough and is lifted from this email:

<a href="http://rubyforge.org/pipermail/rhg-discussion/2006-April/000136.html">http://rubyforge.org/pipermail/rhg-discussion/2006-April/000136.html</a> I think that the gist of it is that Unicode will be &quot;just one character set&quot; supported by Ruby. This idea has been kicked around for Python before but you quickly run into questions about how you compare character strings from multiple character sets, to say nothing of the complexity of an character encoding and character set agnostic regular expression engine.

I guess Matz is the right guy to experiment with that stuff. Maybe it could be copied in Python 4K. <pre>What are your complaints towards Unicode? * it's thoroughly used, isn't it. * resentment towards Han unification?

* inferiority complex of Japanese people? -- What are your complaints towards Unicode? * no, no I do not have any complaints about Unicode * in the domains where Unicode is adequate -- Then, why CSI?

<br><br>In most applications, UCS is enough thanks to Unicode.<br>However, there are also applications for which this is not the case.<br>--<br>Fields for which Unicode is not enough<br>Big character sets<br>* Konjaku-Mojikyo (Japanese encoding which includes many more than Unicode)

<br>* TRON code<br>* GB18030<br>--<br>Fields for which Unicode is not fitted<br>Legacy encodings<br>* conversion to UCS is useless<br>* big conversion tables<br>* round-trip problem<br>--<br>If a language chooses the UCS system

<br>* you cannot write non-UCS applications<br>* you can't handle text that can't be expressed with Unicode<br>--<br>If a language chooses the CSI system<br>* CSI is a superset of UCS<br>* Unicode just has to be handled in CSI

<br>--<br>... is what we can say but<br>* CSI is difficult<br>* can it really be implemented?<br>--<br>That's where comes out Japan's traditional arts<br><br>Adaptation for the Japanese language of applications<br>* Modification of English language applications to be able to process Japanese

<br>--<br>Adaptation for the Japanese language of applications<br><br>* What engineers of long ago experienced for sure<br>  - Emacs (NEmacs)<br>  - Perl (JPerl)<br>  - Bash<br>--<br>Accumulation of know-how<br><br>In Japan, the know-how of adaptation for the Japanese language

<br>(multi-byte text processing)<br>has been accumulated.<br>--<br>Accumulation of know-how<br><br>in the first place, just for local use,<br>text using 3 encodings circulate<br>(4 if including UTF-8)<br>--<br>Based on this know-how

<br>* multibyte text encodings<br>* switching between encodings at the string level<br>* processing them at practical speed<br>is finished<br>--<br>Available encodings<br><br>euc_tw   euc_jp   iso8859_*  utf-8     utf-32le

<br>ascii    euc_kr   koi8       utf-16le  utf-32be<br>big5     gb2312   sjis       utf-16be<br><br>...and many others<br>If it's a stateless encodings, in principle it can be available.<br>--<br>It means<br>For applications using only one encoding, code conversion is not needed

<br>--<br>Moreover<br>Applications wanting to handle multiple encodings can choose an<br>internal encoding (generally Unicode) that includes all others<br>--<br>If you want to<br>* you can also handle multiple encodings without conversion, letting

<br>characters as they are<br>* but this is difficult so I do not recommend it<br>--<br>However,<br>only the basic part is done,<br>it's far from being ready for practical use<br>* code conversion<br>* guessing encoding<br>

* etc.<br>--<br>For the time being, today<br>I want to tell everyone:<br>* UCS is practical<br>* but not all-purpose<br>* CSI is not impossible<br>--<br>The reason I'm saying that<br>They may add CSI in Perl6 as they had added

<br>* Methods called by &quot;.&quot;<br>* Continuations<br>from Ruby.<br>Basically, they hate losing.<br>--<br>Thank you</pre><br>