[New-bugs-announce] [issue7663] UTF-16 build incorrectly translates cases for non-BMP code points
Jean-Paul Calderone
report at bugs.python.org
Sun Jan 10 06:27:30 CET 2010
New submission from Jean-Paul Calderone <exarkun at divmod.com>:
This issue may extend beyond just unicode.upper() and unicode.lower(), but it's very clear with these two methods, at least.
For example, consider DESERET SMALL LETTER EW. On a UTF-16 build, calling upper on a string containing this doesn't change it to the capital variation (DESERET CAPITAL LETTER EW):
>>> u'\N{DESERET SMALL LETTER EW}'.upper() == u'\N{DESERET SMALL LETTER EW}'
True
It can also be seen that this isn't even recognized as lower case:
>>> u'\N{DESERET SMALL LETTER EW}'.islower()
False
With a UTF-32 build, however, the expected behavior (ie, the behavior one would get for a code point in the BMP with small and capital variations) is provided.
----------
components: Interpreter Core
messages: 97500
nosy: exarkun
severity: normal
status: open
title: UTF-16 build incorrectly translates cases for non-BMP code points
versions: Python 2.7
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7663>
_______________________________________
More information about the New-bugs-announce
mailing list