
On Thursday, July 21, 2016 at 11:45:11 AM UTC+5:30, Danilo J. S. Bellini wrote:
2016-07-21 1:53 GMT-03:00 Pavol Lisy <pavol...@gmail.com <javascript:>>:
On 7/20/16, Danilo J. S. Bellini <danilo....@gmail.com <javascript:>> wrote:
4. Unicode have more than one codepoint for some symbols that look alike, for example "Σ𝚺𝛴𝜮𝝨𝞢" are all valid uppercase sigmas. There's also "∑", but this one is invalid in Python 3. The italic/bold/serif distinction seems enough for a distinction, and when editing a code with an Unicode char like that, most people would probably copy and paste the symbol instead of typing it, leading to a consistent use of the same symbol.
I am not sure what do you like to say, so for sure some info:
PEP-3131 (https://www.python.org/dev/peps/pep-3131/): "All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC."
From this point of view all sigmas are same:
set(unicodedata.normalize('NFKC', i) for i in "Σ𝚺𝛴𝜮𝝨𝞢") == {'Σ'}
In this item I just said that most programmers would probably keep the same character in a source code file due to copying and pasting, and that even when it doesn't happen (the copy-and-paste action), visual differences like italic/bold/serif are enough for one to notice (when using another input method).
At first, I was thinking on a code with one of those symbols as a variable name (any of them), but PEP3131 challenges that. Actually, any conversion to a normal form means that one should never use unicode identifiers outside the chosen normal form. It would be better to raise an error instead of converting.
Yes Agree I said “Nice!” for
Σ = 1 𝚺 = Σ + 1 𝛴 2
in comparison to:
А = 1 A = A + 1
because the A's look more indistinguishable than the sigmas and are internally more distinct If the choice is to simply disallow the confusables that’s probably the best choice IOW 1. Disallow co-existence of confusables (in identifiers) 2. Identify confusables to a normal form — like case-insensitive comparison and like NKFC 3. Leave the confusables to confuse My choice 1 better than 2 better than 3