Unicode from Web to MySQL
francisgavila at yahoo.com
Thu Dec 25 04:57:33 CET 2003
John Benson wrote in message ...
>Making my way through the list, I was very happy to see Francis Avila's
>discussion of Unicode. I was trying to work up enough courage to ask for
>something like this, and was very happy to see it appear spontaneously. My
Glad to help.
>only quibble would be to characterize Unicode as a mapping from numbers to
>glyphs (instead of letters), since symbols from foreign alphabets sometimes
>look more like squashed bugs than letters.
See, I think neither letter nor glyph is a good word for it. Letter is too
narrow, since unicode contains much more than letters, namely numbers,
symbols, diacriticals, presentation forms, etc. But "glyphs" is too
specific because it refers only to the shape, and unicode isn't a mapping of
numbers to meaningful shapes, but of numbers to the meanings those shapes
have. Even if the standard representation of U+0041 LATIN CAPTIAL LETTER A
were '%', or any other strange shape, unicode would still be valid. (The
unicode code charts helpfully mention if a code point typically looks
similar to another code point, but this is not core unicode.)
So really, those all-caps code point descriptions are the essence of
unicode: unicode is a mapping of numbers to whatever is represented by the
corresponding description (so we see just how *abstract* unicode really is).
I don't know what to call this essence that the code points are mapped to.
Joel of joelonsoftware.com uses the analogy of unicode mapping numbers to
the Platonic forms of letters/numbers/symbols, which is really the perfect
analogy for what unicode does.
However, sometimes unicode botches even the code point descriptions by
making them too tied to the glyph. E.g., U+0028 LEFT PARENTHESIS should
really be OPEN PARENTHESIS, because right-to-left text (Arabac or Hebrew)
will have U+0028 on the right side, and the glyph will face the opposite
More information about the Python-list