[Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints)

Neil Hodgson neilh at scintilla.org
Sun Apr 30 21:45:57 EDT 2000


> Of course I meant UCS-2, but I'm confused now. Isn't UCS-2 and UTF-16 more
> or less the same?

   Well, depends on how far you stretch 'more or less'. UTF-16 has room to
encode about 900,000 characters by using two 16 bit elements. UCS-2
currently has room for about 57,000 characters with 39,000 currently
assigned. UTF-16 is mostly of concern for academic use with extinct
languages although it could also be used for Han deunification.

   For an editor to support UTF-16, I would expect it to treat 32 bit UTF-16
characters as indivisible, at least not putting the caret between the two
halves even if it did not have a sensible glyph to display.

   Neil





More information about the Python-list mailing list