Turkic I and re

Thu Sep 15 11:00:09 EDT 2011

On 09/15/11 09:06, MRAB wrote:
> It's somewhat unlikely that Unicode will become locale-dependent in
> Python because it would cause problems; you don't want:
>
>       "i".upper() == "I"
>
> to be maybe true, maybe false.
>
> An option would be to specify whether it should be locale-dependent.

There have been several times when I've wished that unicode 
strings would grow something like

   .locale_aware_insensitive_cmp(other[,
      locale=locale.getdefaultlocale()]
      )

to return -1/0/1 like cmp(), or in case sort-order is 
nonsensical/arbitrary (such as for things like Mandarin glyphs)

   .insensitive_locale_equal(other,[,
      locale=locale.getdefaultlocale()]
      )

so you could do something like

  if "i".locale_aware_insensitive_cmp("I"):
    not_equal()
  else:
    equal()

or

  if "i".insensitive_locale_equal("I"):
    equal()
  else:
    not_equal()

because while I know that .upper() or .lower() doesn't work in a 
lot of cases, I don't really care about the upper/lower'ness of 
the result, I want to do an insensitive compare (and most of teh 
time it's just for equality, not for sorting).  It's my 
understanding[1] that the same goes for the German where 
"ß".upper() is traditionally written as "SS" but "SS".lower() is 
traditionally just "ss" instead of "ß".

So if these language-dependent comparisons were relegated to a 
well-tested core method of a unicode string, it may simplify the 
work/issue for you.

-tkc

[1]
http://en.wikipedia.org/wiki/Letter_case#Special_cases