Unicode Unification Objections

Fredrik Lundh effbot at telia.com
Mon May 8 14:46:19 EDT 2000

Aahz Maruch <aahz at netcom.com> wrote:
> >the distinction cannot be preserved in a naked unicode character
> >stream, but it's done that way on purpose.  you cannot really handle
> >text strings correctly (rendering, sorting, comparing, etc) unless you
> >have language and locale information.
> >
> >this is as true for unicode as it is for latin 1 or any other
> >character set.  after all, the "western culture" isn't really as homo-
> >geneous as you americans seem to think ;-)
> In other words, "someone" needs to devise a standardized system that
> encodes all the information needed to represent a string.  To deal with
> the cases Dennis talks about, you need to concatenate multiple string
> objects into some larger buffer.  Am I understanding you?

XML supports language markup (the xml:lang attribute).

language/locale information can also be used in HTTP content tags,
MIME headers, etc.

in 31-bit unicode, there's also something called "plane 14 language tags"
which can (in theory, at least) be used to insert language codes in a uni-
code stream.


More information about the Python-list mailing list