[Python-ideas] Re: Idea: Tagged strings in python

Dec. 21, 2022

      On Wed, 21 Dec 2022 at 17:39, David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
...
Oh yeah. Good points! Do we need a PEP for str.upper() to grow an optional 'locale' argument? I feel like there are examples other than the Turkish i's where this matters, but it's past my bedtime, so they aren't coming to mind.
I don't think str.upper() is the place for it; Python has a locale
module that is a better fit for this. (That's where you'd go if you
want to alphabetize strings with proper respect to language, for
instance.) But it's a difficult problem. Some languages have different
case-folding rules depending on whether you're uppercasing a name or
some other word. German needs to know whether something's a noun,
because even when lowercased, they have an initial capital letter.

The Unicode standard offers a reasonably-generic set of tools,
including for case folding. If you feel like delving deep, the
standard talks about case conversions in section 3.13 - about a
hundred and fifty pages into this document:
https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf - but as far
as I know, that's still locale-agnostic. I think (though I haven't
checked) that Python's str.upper/str.lower follow these rules.

Anything non-generic would be a gigantic task, not well suited to the
core string type, as it would need to be extremely context-sensitive.
Anyone who needs that kind of functionality should probably be
reaching for the locale module for other reasons anyway, so IMO that
would be a better place for a case-conversion toolkit.

ChrisA

[Python-ideas] Re: Idea: Tagged strings in python

Chris Angelico