[Python-ideas] Visually confusable unicode characters in identifiers

Jim Jewett jimjjewett at gmail.com
Mon Oct 1 17:43:04 CEST 2012

On 9/30/12, Steven D'Aprano <steve at pearwood.info> wrote:
> On 01/10/12 00:00, Oscar Benjamin wrote:

> py> A = 42
> py> Α = 23
> py> A == Α
> False

It will never be possible to catch all confusables, which is one
reason that the unicode property stalled.

It seems like it would be reasonable to at least warn when identifiers
are not all in the same script -- but real-world examples from Emacs
Lisp made it clear that this is often intentional.  There were still
clear word-boundaries, but it wasn't clear how that word-boundary
detection could be properly automated in the general case.

> Besides, just because you and I can't distinguish A from Α in my editor,
> using one particular choice of font, doesn't mean that the author or his
> intended audience (Greek programmers perhaps?) can't distinguish them,

In many cases, it does -- for the letters to look different requires
an unnatural font choice, though perhaps not so extreme as the
print-the-hex-code font.

> I would welcome "confusable detection" in the standard library, possibly a
> string method "skeleton" or some other interface to the Confusables file,
> perhaps in unicodedata.

I would too, and agree that it shouldn't be limited to identifiers.


More information about the Python-ideas mailing list