[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

Martin v. Löwis report at bugs.python.org
Sun Sep 18 10:45:53 CEST 2011


Martin v. Löwis <martin at v.loewis.de> added the comment:

Tom: it's intentional that .title() doesn't use traditional word break algorithms. In 2.x, "foo3bar".title() is "Foo3Bar", i.e. the 3 counts as a word end. So neither UTS#18 \w nor UAX#29 apply. So in UTS#18 terminology, .title() matches more closes \alpha+, despite UTS#18 saying that this shouldn't be used for word-breaking.

It's not clear to me how UTS#18 defines \alpha. On the one hand, they say that marks should be included, OTOH they refer to the Alphabetic derived category which doesn't include marks, except for the few that have been included in Other_Alphatetic.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12737>
_______________________________________


More information about the Python-bugs-list mailing list