[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

Tom Christiansen report at bugs.python.org
Sat Aug 27 00:00:18 CEST 2011


Tom Christiansen <tchrist at perl.com> added the comment:

Guido van Rossum <report at bugs.python.org> wrote
   on Fri, 26 Aug 2011 21:16:57 -0000: 

> Yeah, this should be fixed in 3.3 and probably backported to 3.2
> and 2.7.  (There is already no guarantee that len(s) ==
> len(s.title()), right?)

Well, *I* don't know of any such guarantee, 
but I don't know Python very well.

In general, Unicode makes very few guarantees about casing.  Under full
casemapping, which is the only way to do the silly Turkish stuff amongst
quite a bit else, any of the three casemappings can change the length of
the string.

Other things you can't rely on are round tripping and "single paths".  By
roundtripping, just look at the two lowercase sigmas and think about how
you can't get back to one of them if you uppercase them both.  By single
paths, I mean that code that does some sort of conversion where it first
lowercases everything and then titlecases the first letter can produce
something different from titlecasing just the original first letter and
then lowercasing the rest of them.  That's because tc(x) and tc(lc(x)) can
be different.

--tom

----------
title: str.title()  is overzealous by upcasing combining marks inappropriately -> str.title() is overzealous by upcasing combining marks inappropriately

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12737>
_______________________________________


More information about the Python-bugs-list mailing list