2016-07-21 1:53 GMT-03:00 Pavol Lisy <pavol.lisy@gmail.com>:
On 7/20/16, Danilo J. S. Bellini <danilo.bellini@gmail.com> wrote:

> 4. Unicode have more than one codepoint for some symbols that look alike,
> for example "Σ𝚺𝛴𝜮𝝨𝞢" are all valid uppercase sigmas. There's also "∑",
> but this one is invalid in Python 3. The italic/bold/serif distinction
> seems enough for a distinction, and when editing a code with an Unicode
> char like that, most people would probably copy and paste the symbol
> instead of typing it, leading to a consistent use of the same symbol.

I am not sure what do you like to say, so for sure some info:

PEP-3131 (https://www.python.org/dev/peps/pep-3131/): "All identifiers
are converted into the normal form NFKC while parsing; comparison of
identifiers is based on NFKC."

>From this point of view all sigmas are same:

  set(unicodedata.normalize('NFKC', i) for i in "Σ𝚺𝛴𝜮𝝨𝞢")  == {'Σ'}

In this item I just said that most programmers would probably keep the same character in a source code file due to copying and pasting, and that even when it doesn't happen (the copy-and-paste action), visual differences like italic/bold/serif are enough for one to notice (when using another input method).

At first, I was thinking on a code with one of those symbols as a variable name (any of them), but PEP3131 challenges that. Actually, any conversion to a normal form means that one should never use unicode identifiers outside the chosen normal form. It would be better to raise an error instead of converting. If there isn't any lint tool already complaining about that, I strongly believe that's something that should be done. When mixing strings and identifier names, that's not so predictable:

>>> obj = type("SomeClass", (object,), {c: i for i, c in enumerate("Σ𝚺𝛴𝜮𝝨𝞢")})()
>>> obj.𝞢 == getattr(obj, "𝞢")
>>> obj.Σ == getattr(obj, "Σ")
>>> dir(obj)
[..., 'Σ', '𝚺', '𝛴', '𝜮', '𝝨', '𝞢']

Danilo J. S. Bellini
"It is not our business to set up prohibitions, but to arrive at conventions." (R. Carnap)