
On Thursday, July 21, 2016 at 12:51:27 PM UTC+5:30, Chris Angelico wrote:
On Thu, Jul 21, 2016 at 4:26 PM, Rustom Mody <rusto...@gmail.com <javascript:>> wrote:
IOW 1. Disallow co-existence of confusables (in identifiers) 2. Identify confusables to a normal form — like case-insensitive comparison and like NKFC 3. Leave the confusables to confuse
My choice 1 better than 2 better than 3
So should we disable the lowercase 'l', the uppercase 'I', and the digit '1', because they can be confused? What about the confusability of "m" and "rn"? O and 0 are similar in some fonts. And case insensitivity brings its own problems - is "ss" equivalent to "ß", and is "ẞ" equivalent to either? Turkish distinguishes between "i", which upper-cases to "İ", and "ı", which upper-cases to "I".
We already have interminable debates about letter similarities across scripts. I'm sure everyone agrees that Cyrillic "и" is not the same letter as Latin "i", but we have "AАΑ" in three different scripts. Should they be considered equivalent? I think not, because in any non-trivial context, you'll know whether the program's been written in Greek, a Slavic language, or something using the Latin script. But maybe you disagree. Okay; are "BВΒ" all to be considered equivalent too? What about "СC"? "XХΧᚷ"? They're visually similar, but they're not equivalent in any other way. And if you're going to say things should be considered equivalent solely on the basis of visuals, you get into a minefield - should U+200B ZERO WIDTH SPACE be completely ignored, allowing "AB" to be equivalent to "A\u200bB" as an identifier?
I said 1 better than 2 better than 3 Maybe you also want to add: Special cases aren't special enough to break the rules. Although practicality beats purity. followed by Errors should never pass silently. IOW setting out 1 better than 2 better than 3 does not necessarily imply its completely achievable