An assessment of the Unicode standard

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Aug 30 07:46:12 CEST 2009


On Sun, 30 Aug 2009 03:07:17 +0000, Neil Hodgson wrote:

>    Not sure if you are referring to the ☃ snowman character or Arctic
> region languages like Canadian Aboriginal syllabic writing like ᐲᐦᒑᔨᕽ
> which were added to Unicode 8 years after the initial version. I'd guess
> that was added from political rather than marketing motives. ☃ was
> required since it was present in Japanese character sets.


If I recall correctly, the snowman was specifically added at the request 
of Japanese television producers, because it is a standard glyph used for 
representing snow when showing the weather on TV.

Unicode's stated aim is to have a single universal standard for all 
characters needed for communication. From the Unicode Consortium:

[quote]
What is Unicode?
Unicode provides a unique number for every character, no matter what the 
platform, no matter what the program, no matter what the language.

...
Even for a single language like English no single encoding was adequate 
for all the letters, punctuation, and technical symbols in common use.

These encoding systems also conflict with one another. That is, two 
encodings can use the same number for two different characters, or use 
different numbers for the same character. Any given computer (especially 
servers) needs to support many different encodings; yet whenever data is 
passed between different encodings or platforms, that data always runs 
the risk of corruption.

Unicode is changing all that!

Unicode provides a unique number for every character, no matter what the 
platform, no matter what the program, no matter what the language.
[end quote]

And from the FAQs:

[quote]
Unicode covers all the characters for all the writing systems of the 
world, modern and ancient. It also includes technical symbols, 
punctuations, and many other characters used in writing text.
[end quote]


It's not just about supporting languages used by foreigners too stupid to 
speak English (sarcasm!). It's about supporting business users who want a 
standard way of referring to dingbats and pictographs, historians who 
need to deal with ancient writings and obsolete characters, scientists 
and mathematicians who want to use mathematical symbols, editors and book 
publishers who want to use their own typographic symbols, including 
Braille, musical symbols, and even TV producers who want to include 
snowmen on their weather charts.

The Unicode system replaces dozens of incompatible, clashing systems with 
a single universal, extensible system. Why would anyone want to go back 
to the Bad Old Days where you couldn't transfer data from one OS to 
another, or even from one application to another, without quote marks 
turning into mathematical symbols or boxes?



-- 
Steven



More information about the Python-list mailing list