OT [Way OT]: Unicode Unification Objections

François Pinard pinard at iro.umontreal.ca
Mon May 8 16:48:18 EDT 2000


"Fredrik Lundh" <effbot at telia.com> écrit:

> on the other hand, if you tell me that I should start sorting swedish
> names in ISO Latin 1 order (or german, or english, etc), or that leaving
> out the dots and rings when comparing strings won't hurt anyone, I'll
> reach for my revolver.

I'm not going to tell you that, ever! :-)

Some proponents of decomposed representation of characters in Unicode are
asserting that decomposition will tremendously ease sorting.  This is not
only a weak argument, it also reveals how naive they are about sorting.

And besides, would it be true that decomposition is a fantastic advantage
(which it is not, anyway) for sorting, that it would be a wrong weighing
of values.  Applications access individual characters (fetch and store)
a great deal more often than they sort them.  Fixed width is the usual
way to simplicity and speed in this area, and Python recognised this by
implementing UTF-8 to internal conversions, instead of keeping Unicode
strings internally represented as UTF-8, say.

A sad lie of Unicode (one lie among others :-) is that even with internal
UCS-2 representation, characters are still of varying length, for those
nations which did not lobby soon or strong enough to have pre-composed
characters.  Americans, Europeans, Vietnamese and many other nations are
already happy in that respect, they are not the one who will complain,
as they have no own reason to not be satisfied.

> afaik, ISO 10646 has been adopted as a japanese national standard,
> so I assume they've made up their mind on this one.

They did?  I did not closely follow polls and lobbies for the last few years.
(I rather hate politics :-).  That would be good news, indeed.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard






More information about the Python-list mailing list