[Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292)

"Martin v. Löwis" martin at v.loewis.de
Wed Sep 15 00:03:05 CEST 2004


Barry Scott wrote:
> Then came ISO 10646 which gave every language its own unique set
> of code points. But ISO 10646 is not easy to process which lead to the
> development of unicode that is easier to implement and work but could
> not originally deal with all the code points required for all the worlds
> languages. 

I think this is historically incorrect. ISO 10646 and Unicode were
developed in lock-step, and the very first publication of ISO 10646
(in 1993) had precisely the same character assignments as Unicode 1.1.
Ever since then, both standards are roughly the same.

> I believe that was been fixed now you can have 32bit unicode.

This is also incorrect. Unicode now has roughly 20.09 bits. ISO 10646
used to have 32 bits, but now also restricts itself to 20.09 bits.
There are encodings of it which take four octets per code point.

> Somewhere in the code point space you have to have ASCII. I'd be charitable
> and say that its pragmatic that its in code page 0 given the history of 
> the computer
> industry.

Strictly speaking, this is group 0, plane 0, row 0 (actually, only the
first 128 cells of this row).

>  From now on if you use unicode no language has an advantage,
> all are equal and software authors stand a chance to create international
> software.

... assuming encodings are the only issue in creating international
software.

Regards,
Martin


More information about the Python-Dev mailing list