On Wed, Nov 03, 2021 at 11:21:53AM +1100, Chris Angelico wrote:
TBH, I'm not entirely sure how valid it is to talk about *security* considerations when we're dealing with Python source code and variable confusions, but that's a term that is well understood.
It's not like Unicode is the only way to write obfuscated code, malicious or otherwise.
But to the extent that it is a security concern, it's not one that linters can really cope with. I'm not sure how a linter would stop someone from publishing code on PyPI that causes confusion by its character encoding, for instance.
Do we require that PyPI prevents people from publishing code that causes confusion by its poorly written code and obfuscated and confusing identifiers? The linter is to *flag the issue* during, say, code review or before running the code, like other code quality issues. If you're just running random code you downloaded from the internet using pip, then Unicode confusables are the least of your worries. I'm not really sure why people get so uptight about Unicode confusables, while being blasé about the opportunities to smuggle malicious code into pure ASCII code. https://en.wikipedia.org/wiki/Underhanded_C_Contest Is it unfamiliarity? Worse? "Real programmers write identifiers in English." And the ironic thing is, while it is very difficult indeed for automated checkers to detect underhanded code in ASCII, it is trivially easier for editors, linters and other tools to spot the sort of Unicode confusables we're talking about here. But we spend all our energy worrying about the minor issue, and almost none on the broader problem of malicious code in general. I'm pretty sure I could upload a library to PyPI that included os.system('rm -rf .') and nobody would blink an eye, but if I write: A = 1 А = 2 Α = 3 print(A, А, Α) everyone goes insane. Let's keep the threat in perspective. Writing an informational PEP for the education of people is a great idea. Rushing into making wholesale changes to the interpreter, not so much. -- Steve