OT [Way OT]: Unicode Unification Objections

François Pinard pinard at iro.umontreal.ca
Mon May 8 19:43:32 CEST 2000

Kevin Russell <krussell4 at videon.home.com> writes:

> "Dennis E. Hamilton" wrote:

> > In Japanese texts, when a borrowed or employed Korean word is used,
> > a desired practice is to render the Korean characters as different,
> > even though some or all of them involve "the same character" common to
> > both languages.  However, the iconography (or calligraphy) is commonly
> > different.

> The entire point of markup is that distinctions like this *shouldn't*
> be preserved in simple text.

It depends.  English and French use the same, or quite similar calligraphy
for letters, and using common fonts is not at all a problem.  However, we
do not use the same calligraphy for quotes, so English quotes and French
quotes are different, and Unicode keep them different.

If by tradition, the same English and French letter were using different
calligraphy, I would not like to write French using English letters,
and you would object to be forced using French letters to write English,
despite the letters would be recognisable and we could read each other.

If we were using a common international character set, it would be important
that I could mix English and French in a same text, using different
characters (because we both know they are different, despite similar and
recognisable), and without having to resort to typographical annotations.
Any stranger, writing French and having strong opinions (because he studied
French in school or travelling, or after s/he had a love affair in France),
who would dare spelling my own needs for French, would be rather unwelcome.

If many Japanese and Chinese feel the strong need of distinguishing their
characters in a common character set, they just know better than me,
however competent I may feel in Asian matters, and regardless of my great
urge to spare one more bit.

> A similar situation holds in almost every written language.  We often
> dump Latin or French words into English text.  Even though they may be
> written in the same alphabet, we usually want them to *look* different
> from ordinary English words.

The situation is just not similar.  I do not have a strong, perpetual need
that the look be different, and the truth is that as a French reader, I
honestly do not mind much if we use the same characters.  All the contrary,
it is quite convenient, and not shocking at all.  (Not everybody is so
lucky.  For example, when Unicode decided that `y-diaresis' could be used
to replace the Finnish `ij' ligature, Finnish people were not very happy.[1])

Agreed that Japanese people are themselves on this.  Some are ready to
accept Han unification together with Microsoft Windows, if they really
have to be bundled together, they just don't care.  Others do.

We have the same in French.  Some people are very ready to drop diacritic
marks while using computers.  They never got it right at school anyway,
and diacritics create technical problems they are too happy to recognise as
intractable if they can be.  Why not drop proper spelling on the same blow?
More than one said: "We did not choose literature, anyway!".  ASCII is more
than enough for them.  Please don't read that ASCII is enough for French.

I'm not saying that Japanese are right or wrong about unification.  This is
their problem and their decision.  One sure thing is that _we_ are wrong when
we are haughty enough to have a strong opinion about what they should do.

Granted that Python supports Unicode.  There is a danger that, because we
love Python, and because Unicode is well supported, that we start blindly
loving Unicode.  What I'm trying to tell is that we should keep a clear
mind, and happily use Unicode while keeping in mind that this it not the
final word of everything about charsets.  Don't become Unicode fanatics.

[1] If I remember correctly.  This was years ago.

François Pinard   http://www.iro.umontreal.ca/~pinard

