[Python-ideas] Visually confusable unicode characters in identifiers

Steven D'Aprano steve at pearwood.info
Sun Sep 30 17:10:18 CEST 2012


On 01/10/12 00:00, Oscar Benjamin wrote:
> Having just discovered that PEP 3131 [1] enables me to use greek letters to
> represent variables in equations, it was pointed out to me that it also
> allows visually confusable characters in identifiers [2].

You don't need PEP 3131 to have visually confusable identifiers.

MyObject = My0bject = "many fonts use the same glyph for O and 0"

rn = m = 23  # try reading this in Ariel with a small font size

x += l


I don't think it's up to Python to protect you from arbitrarily poor choices
in identifiers and typefaces, or against obfuscated code (whether deliberately
so or by accident). Use of confusable identifiers is a code-quality issue,
little different from any other code-quality issue:

class myfunction:
     def __init__(a, b, c, d, e, f, g, h, i, j, k, l):
         a.b = b-e+k*h
         a.a = i + 1j*j
         a.l = ll + l1 + l
         a.somebodytoldmeishouldusemoredesccriptivevaraiblenames = g+d
         a.somebodytoldmeishouldusemoredesccribtivevaraiblenames = c+f

You surely wouldn't expect Python to protect you from ignorant or obnoxious
programmers who wrote code like that. I likewise don't think Python should
protect you from programmers who do things like this:

py> A = 42
py> Α = 23
py> A == Α
False

Besides, just because you and I can't distinguish A from Α in my editor,
using one particular choice of font, doesn't mean that the author or his
intended audience (Greek programmers perhaps?) can't distinguish them, using
their editor and a more suitable typeface. The two characters are distinct
using Courier or Lucinda Typewriter, to mention only two.


> Is the proposal mentioned in the PEP (to use something based on Unicode
> Technical Standard #39 [3]) something that might be implemented at any
> point?

> [3] http://unicode.org/reports/tr39/#Confusable_Detection

I would welcome "confusable detection" in the standard library, possibly a
string method "skeleton" or some other interface to the Confusables file,
perhaps in unicodedata. And I would encourage code checkers like PyFlakes,
PyLint, PyChecker to check for confusable identifiers. But I do not believe
that this should be built into the Python language itself.



-- 
Steven



More information about the Python-ideas mailing list