Incorrect title case?
Terry Reedy
tjreedy at udel.edu
Sat Jan 17 17:14:38 EST 2009
John Machin wrote:
> On Jan 17, 9:07 am, MRAB <goo... at mrabarnett.plus.com> wrote:
>> Python 2.6.1
>>
>> I've just found that the following 4 Unicode characters/codepoints don't
>> behave as I'd expect: Dž (U+01C5), Lj (U+01C8), Nj (U+01CB), Dz (U+01F2).
>>
>> For example, u"\u01C5".istitle() returns True and
>> unicodedata.category(u"\u01C5") returns "Lt", but u"\u01C5".title()
>> returns u'\u01C4', which is the uppercase equivalent. Are these mistakes
>> in the Unicode database?
>
> Doesn't look like it. AFAICT it's a mistake in Objects/unicodetype.c,
> function _PyUnicode_ToTitlecase.
>
> See http://svn.python.org/view/python/trunk/Objects/unicodectype.c?rev=66362&view=markup
>
> The code that says:
> if (ctype->title)
> delta = ctype->title;
> else
> delta = ctype->upper;
> should IMHO merely be:
> delta = ctype->title;
>
> A value of zero for ctype->title should be interpreted simply as the
> offset to add to the ordinal, as it is in the sibling _PyUnicode_To
> (Upper|Lower)case functions. See also Tools/unicode/makeunicodedata.py
> which treats upper, lower and title identically when preparing the
> tables used by those 3 functions.
>
> AFAICT making that change will fix the problem for those four
> characters and not ruin any others.
>
> The error that you noticed occurs as far back as I've looked (2.1) and
> also occurs in 3.0.
Please post a report to the tracker at bugs.python.org.
More information about the Python-list
mailing list