[Python-Dev] len(chr(i)) = 2?
greg.ewing at canterbury.ac.nz
Wed Nov 24 00:49:42 CET 2010
Alexander Belopolsky wrote:
> Because the most commonly used characters are all in the Basic
> Multilingual Plane, converting between surrogate pairs and the
> original values is often not tested thoroughly. This leads to
> persistent bugs, and potential security holes, even in popular and
> well-reviewed application software.
Maybe Python should have used UTF-8 as its internal unicode
representation. Then people who were foolish enough to assume
one character per string item would have their programs break
rather soon under only light unicode testing. :-)
More information about the Python-Dev