How do I display unicode value stored in a string variable using ord()
Paul Rubin
no.email at nospam.invalid
Sat Aug 18 14:26:21 EDT 2012
Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
> (There is an extension to UCS-2, UTF-16, which encodes non-BMP characters
> using two code points. This is fragile and doesn't work very well,
> because string-handling methods can break the surrogate pairs apart,
> leaving you with invalid unicode string. Not good.)
...
> With PEP 393, each Python string will be stored in the most efficient
> format possible:
Can you explain the issue of "breaking surrogate pairs apart" a little
more? Switching between encodings based on the string contents seems
silly at first glance. Strings are immutable so I don't understand why
not use UTF-8 or UTF-16 for everything. UTF-8 is more efficient in
Latin-based alphabets and UTF-16 may be more efficient for some other
languages. I think even UCS-4 doesn't completely fix the surrogate pair
issue if it means the only thing I can think of.
More information about the Python-list
mailing list