[I18n-sig] Unicode strings: an alternative

Tom Emerson tree@basistech.com
Fri, 5 May 2000 08:34:35 -0400 (EDT)


Just van Rossum writes:
 > Good point. All this taken together still means to me that comparisons
 > between wide and narrow strings should take place at the character level,
 > which implies that coercion from narrow to wide is done at the character
 > level, without looking at the encoding. (Which in my book in turn still
 > implies that as long as we're talking about Unicode, narrow strings are
 > effectively Latin-1.)

Only true if "wide" strings are encoded in UCS-2 or UCS-4. If "wide
characters" are Unicode, but stored in UTF-8 encoding, then you loose.

Hmmmm... how often do you expect to compare narrow vs. wide strings,
using default comparison (i.e. = or !=)? What if I'm using Latin 3 and
use the byte comparison? I may very well have two strings (one narrow,
one wide) that compare equal, even though they're not. Not exactly
what I would expect.

     -tree

[I'm flying from Seattle to Boston today, so eventually I will
 disappear for a while]

-- 
Tom Emerson                                          Basis Technology Corp.
Language Hacker                                    http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"