On 7/21/16, Danilo J. S. Bellini <danilo.bellini@gmail.com> wrote:
2016-07-21 1:53 GMT-03:00 Pavol Lisy <pavol.lisy@gmail.com>:
set(unicodedata.normalize('NFKC', i) for i in "Σ𝚺𝛴𝜮𝝨𝞢") == {'Σ'}
In this item I just said that most programmers would probably keep the same character in a source code file due to copying and pasting, and that even when it doesn't happen (the copy-and-paste action), visual differences like italic/bold/serif are enough for one to notice (when using another input method).
At first, I was thinking on a code with one of those symbols as a variable name (any of them), but PEP3131 challenges that. Actually, any conversion to a normal form means that one should never use unicode identifiers outside the chosen normal form. It would be better to raise an error instead of converting. If there isn't any lint tool already complaining about that, I strongly believe that's something that should be done. When mixing strings and identifier names, that's not so predictable:
obj = type("SomeClass", (object,), {c: i for i, c in enumerate("Σ𝚺𝛴𝜮𝝨𝞢")})() obj.𝞢 == getattr(obj, "𝞢") False obj.Σ == getattr(obj, "Σ") True dir(obj) [..., 'Σ', '𝚺', '𝛴', '𝜮', '𝝨', '𝞢']
[getattr(obj, i) for i in dir(obj) if i in "Σ𝚺𝛴𝜮𝝨𝞢"] # [0, 1, 2, 3, 4, 5] but: [obj.Σ, obj.𝚺, obj.𝛴, obj.𝜮, obj.𝝨, obj.𝞢, ] # [0, 0, 0, 0, 0, 0] So you could mix any of them while editing identifiers. (but you could not mix them while writing parameters in getattr, setattr and type) But getattr, setattr and type are other beasts, because they can use "non identifiers", non letter characters too: setattr(obj,'+', 7) dir(obj) # ['+', ...] # but obj.+ is syntax error setattr(obj,u"\udcb4", 7) dir(obj) # [..., '\udcb4' ,...] obj = type("SomeClass", (object,), {c: i for i, c in enumerate("+-*/")})() Maybe there is still some Babel curse here and some sort of normalize_dir, normalize_getattr, normalize_setattr, normalize_type could help? I am not sure. They probably make things more complicated than simpler.