
On Wed, 21 Dec 2022 at 17:39, David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
Oh yeah. Good points! Do we need a PEP for str.upper() to grow an optional 'locale' argument? I feel like there are examples other than the Turkish i's where this matters, but it's past my bedtime, so they aren't coming to mind.
I don't think str.upper() is the place for it; Python has a locale module that is a better fit for this. (That's where you'd go if you want to alphabetize strings with proper respect to language, for instance.) But it's a difficult problem. Some languages have different case-folding rules depending on whether you're uppercasing a name or some other word. German needs to know whether something's a noun, because even when lowercased, they have an initial capital letter. The Unicode standard offers a reasonably-generic set of tools, including for case folding. If you feel like delving deep, the standard talks about case conversions in section 3.13 - about a hundred and fifty pages into this document: https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf - but as far as I know, that's still locale-agnostic. I think (though I haven't checked) that Python's str.upper/str.lower follow these rules. Anything non-generic would be a gigantic task, not well suited to the core string type, as it would need to be extremely context-sensitive. Anyone who needs that kind of functionality should probably be reaching for the locale module for other reasons anyway, so IMO that would be a better place for a case-conversion toolkit. ChrisA