[Python-ideas] Re: Idea: Tagged strings in python

Dec. 21, 2022

      On Wed, Dec 21, 2022 at 01:18:46AM -0500, David Mertz, Ph.D. wrote:
...
I'm on my tablet, so cannot test at the moment. But is `str.upper()` REALLY
wrong about the Turkish dotless I (and dotted capital I) currently?!
It has to be. Turkic languages like Turkish, Azerbaijani and Tatar 
distinguish dotted and non-dotted I's, leading to a slew of problems 
infamously known as "The Turkish I problem".

(Other languages use undotted i's but not in the same way, e.g. Irish 
roadsigns in Gaelic usually drop the dot to avoid confusion with í. And 
don't confuse the undotted i with the Latin iota ɩ, which is a 
completely different letter to the Greek iota ι. Alphabets are hard.)

In Turkic languages, we have:

    Letter:       ı    I    i    İ
    -----------  ---  ---  ---  ---
    Lowercase:    ı    ı    i    i
    Uppercase:    I    I    İ    İ

Swapping case can never add or remove a dot. (The technical name for the 
dot is "tittle".) Which is perfectly logical, of course.

But most other people with Latin-based alphabets mix the dotted and 
dotless letters together, leading to this lossy table:

    Letter:       ı    I    i    İ
    -----------  ---  ---  ---  ---
    Lowercase:    ı    i    i    i
    Uppercase:    I    I    I    İ

which is the official Unicode case conversion, which Python follows.
...
...
...
"ıIiİ".lower()
'ıiii̇'
"ıIiİ".upper()
'IIIİ'
Just to make the Turkish I problem even more exciting, you aren't 
supposed to use Turkish rules when changing the case of foreign proper 
nouns. So the popular children's book "Alice Harikalar Diyarında" (Alice 
in Wonderland) should use *both* sets of rules when uppercasing to give 
"ALICE HARİKALAR DİYARINDA".

Sometimes the dot can be very significant.

https://gizmodo.com/a-cellphones-missing-dot-kills-two-people-puts-three-m-3...
...
That feels like a BPO needed if true.
We do whatever the Unicode standard says to do. They say that 
localisation issues are out of scope for Unicode.

-- 
Steve

[Python-ideas] Re: Idea: Tagged strings in python

Steven D'Aprano