[issue1581182] Definition of a "character" is wrong

Marc-Andre Lemburg report at bugs.python.org
Mon Mar 30 16:42:03 CEST 2009


Marc-Andre Lemburg <mal at egenix.com> added the comment:

See this talk for an explanation of the various Unicode terms and how
they map to Python's implementation:

http://www.egenix.com/library/presentations/#PythonAndUnicode

Also note that the Unicode standard has evolved a lot since Unicode
support was added to Python in late 1999. Some terms used in Python
differ from those used in Unicode 5.0 or have been defined in more
strict ways than were common at the time.

And finally: don't forget that Python provides ways of *working* with
Unicode, i.e. it does not guarantee that a Python Unicode string always
contains all code points required for e.g. UTF-16. It is well possible
to store lone surrogates and invalid or unassigned code points in a
Python Unicode string.

----------
nosy: +lemburg

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1581182>
_______________________________________


More information about the Python-bugs-list mailing list