[Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292)

François Pinard pinard at iro.umontreal.ca
Wed Sep 15 01:58:19 CEST 2004

[Barry Scott]

> Then came ISO 10646 which gave every language its own unique set
> of code points.

Many languages at most.  That's far from "every language".  And some
languages, and not the least, were not satisfied with ISO 10646, many
countries long resisted its adoption as a national standard.

> But ISO 10646 is not easy to process which lead to the development of
> unicode [...]

ISO 10646 and Unicode converged.  Unicode was the fact of an industry
consortium, ISO 10646 was more in the realm of international standards.
Why do you say that ISO 10646 was especially "not easy to process"?

> that is easier to implement and work but could not originally deal
> with all the code points required for all the worlds languages.

Before the convergence, ISO 10646 more than Unicode was designed for
many code points, and so, ISO 10646 was more opened to many languages.

> I believe that was been fixed now you can have 32bit unicode.

Neither ISO 10646 nor Unicode are 32 bits.  The limit is 31 bits.

> From now on if you use unicode no language has an advantage, all are
> equal and software authors stand a chance to create international
> software.

English has a clear and definite advantage in Unicode, and this is
reflected in various Unicode-aware programs.  Taking Python as an mere
example, English texts may be translated from `unicode' to `str' without
raising an exception -- not many languages benefit of this property.

Some languages have all their characters pre-combined in Unicode, and
these have the advantage over the others of needing only one code
point per character.  Lately introduced languages met the established
resistance of Unicode (and W3C) to any new pre-combined characters, and
have to cope with zero-width diacritics, so inducing purely artificial
complexities in programs.  Unicode might well have granted them the same
service as early comers.

And there are more complex or difficult things which are needed by
some languages when Unicoded, still unneeded by the above languages,
directionality marks quickly come to mind.

Software authors will support Unicode more or less deeply depending
on the fact they aim German, Hebrew or Korean.  I do not think most
American-centric applications will go very far supporting Unicode.  For
real and complete Unicode support, software authors are only equal by
the hell they have to suffer.  I hardly call this a "chance"! :-)

François Pinard   http://www.iro.umontreal.ca/~pinard

More information about the Python-Dev mailing list