[Python-Dev] UTF-16 code point comparison
Fri, 28 Jul 2000 09:42:56 -0700
> From: Tim Peters [mailto:email@example.com]
> > ... Don't know how long it will take this half of the world to
> > realize it, but UCS-4 is inevitable.
> [Bill Tutt]
> > On new systems perhaps, but important existing systems (Win32,
> > and probably Java) are stuck with that bad decision and have to
> > use UTF-16 for backward compatability purposes.
> Somehow that doesn't strike me as a good reason for Python to mimic them
So don't. If you think UTF-16 is yet another bad engineering decision, then
take the hit now of making Python's unicode support natively UCS-4 so we
don't have a backward compatability problem when the next Unicode or ISO
10646 revision comes out.
Just realize and accept the cost of doing so. (constant conversions for a
nice big chunk of your users.)
> > Surrogates aren't as far out as you might think. (The next rev of
> > the Unicode spec)
> But indeed, that's the *point*: they exhausted their 64K space in just a
> few years. Now the same experts say that adding 4 bits to the range will
> suffice for all time; I don't buy it; they picked 4 bits because that's
> the surrogate mechanism was defined earlier to support.
I don't think the experts are saying the extra 4 bits will suffice for all
time, but it should certainly suffice until we meet aliens form a different
> > That's certainly sooner than Win32 going away. :)
> I hope it stays around forever -- it's a great object lesson in what
> optimizing for yesterday's hardware can buy you <wink>.
Heh. A dev manager from Excel made the exact same comment to me just