
On 24Dec2022 09:11, Chris Angelico <rosuav@gmail.com> wrote:
On Sat, 24 Dec 2022 at 09:07, Cameron Simpson <cs@cskk.id.au> wrote:
On 23Dec2022 22:27, Chris Angelico <rosuav@gmail.com> wrote:
I think this would be a useful feature to have, although it'll probably end up needing a LOT of information (you can't just say "give me a locale-correct uppercasing of this string" without further context). So IMO it should be third-party.
It would probably be good to have a caveat mentioning these context difficulties in the docs of the unicodedata and str/string case fiddling methods. Not a complete exposition, but making it clear that for some languages the rules require context, maybe with a hard-to-implement-correctly example of naive/incorrect use.
Do people actually read those warnings?
I have read them, I think, though not for a while.
Hang on, lemme pop into the time machine and add one to the docstring and docs for str.upper(). Okay, I'm back. Tell me, have you read the docstring?
Python 3.9.13 (main, Aug 11 2022, 14:01:42) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> help(str.upper) Help on method_descriptor: upper(self, /) Return a copy of the string converted to uppercase. Hmm. Did you commit the change? Is the key to the time machine back on its hook? Docs: str.upper() Return a copy of the string with all the cased characters 4 converted to uppercase. Note that s.upper().isupper() might be False if s contains uncased characters or if the Unicode category of the resulting character(s) is not “Lu” (Letter, uppercase), but e.g. “Lt” (Letter, titlecase). The uppercasing algorithm used is described in section 3.13 of the Unicode Standard. and [4] here: Cased characters are those with general category property being one of “Lu” (Letter, uppercase), “Ll” (Letter, lowercase), or “Lt” (Letter, titlecase).
wording that clarifies whether x.upper() uppercases the string in-place?
Well, it says "a copy", so I'd say it's clear. I've only got version 5.0 of Unicode here. [steps into the other room...] Thank you, I see you used the time machine to buy me version 9.0 too :-) Ah, 3.13 is 7 pages of compact text here. I was thinking of something a bit more general, like "case changing is a complex language and context dependent process, and use of str.upper (etc....) therefore perform a simplistic operation". Cheers, Cameron Simpson <cs@cskk.id.au>