Serhiy Storchaka wrote:
02.11.21 16:16, Petr Viktorin пише:
As for \0, can we ban all ASCII & C1 control characters except whitespace? I see no place for them in source code.
All control characters except CR, LF, TAB and FF are banned outside comments and string literals. I think it is worth to ban them in comments and string literals too. In string literals you can use backslash-escape sequences, and comments should be human readable, there are no reason to include control characters in them.
If escape sequences were also allowed in comments (or at least in strings within comments), this would make sense. I don't like banning them otherwise, since odd characters are often a good reason to need a comment, but it is definitely a "mention, not use" situation.
For homoglyphs/confusables, should there be a SyntaxWarning when an identifier looks like ASCII but isn't? It would virtually ban Cyrillic. There is a lot of Cyrillic letters which look like Latin letters, and there are complete words written in Cyrillic which by accident look like other words written in Latin.
At the time, we considered it, and we also considered a narrower restriction on using multiple scripts in the same identifier, or at least the same identifier portion (so it was OK if separated by _). Simplicity won, in part because of existing practice in EMACS scripting, particularly with some Asian languages.
It is a work for linters, which can have many options for configuring acceptable scripts, use spelling dictionaries and dictionaries of homoglyphs, etc.
It might be time for the documentation to mention a specific linter/configuration that does this. It also might be reasonable to do by default in IDLE or even the interactive shell. -jJ