On 9/30/12, Steven D'Aprano
On 01/10/12 00:00, Oscar Benjamin wrote:
py> A = 42 py> Α = 23 py> A == Α False
It will never be possible to catch all confusables, which is one reason that the unicode property stalled. It seems like it would be reasonable to at least warn when identifiers are not all in the same script -- but real-world examples from Emacs Lisp made it clear that this is often intentional. There were still clear word-boundaries, but it wasn't clear how that word-boundary detection could be properly automated in the general case.
Besides, just because you and I can't distinguish A from Α in my editor, using one particular choice of font, doesn't mean that the author or his intended audience (Greek programmers perhaps?) can't distinguish them,
In many cases, it does -- for the letters to look different requires an unnatural font choice, though perhaps not so extreme as the print-the-hex-code font.
I would welcome "confusable detection" in the standard library, possibly a string method "skeleton" or some other interface to the Confusables file, perhaps in unicodedata.
I would too, and agree that it shouldn't be limited to identifiers. -jJ