[Python-3000] How will unicode get used?

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Thu Sep 21 11:22:29 CEST 2006


David Hopwood <david.nospam.hopwood at blueyonder.co.uk> writes:

> People do need to realize that *all* Unicode encodings are
> variable-length, in the sense that abstract characters can be
> represented by multiple code points.

Unicode algorithms for case mapping, word splitting, collation etc.
are generally defined in terms of code points. Character database is
keyed by code points, which is the largest practical text unit with
a finite domain.

Even if on the high level there are some other units, any algorithm
which determines these high level text boundaries is easier to
implement in terms of code points than in terms of even lower-level
UTF-x code units.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


More information about the Python-3000 mailing list