[Python-Dev] Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2 Nov 2021

      On 01. 11. 21 18:32, Serhiy Storchaka wrote:
...
This is excellent!
01.11.21 14:17, Petr Viktorin пише:
...
...
CPython treats the control character NUL (``\0``) as end of input,
but many editors simply skip it, possibly showing code that Python
will not
run as a regular part of a file.
It is an implementation detail and we will get rid of it. It only
happens when you read the Python script from a file. If you import it as
a module or run with runpy, the NUL character is an error.
That brings us to possible changes in Python in this  area, which is an 
interesting topic.

As for \0, can we ban all ASCII & C1 control characters except 
whitespace? I see no place for them in source code.

For homoglyphs/confusables, should there be a SyntaxWarning when an 
identifier looks like ASCII but isn't?

For right-to-left text: does anyone actually name identifiers in 
Hebrew/Arabic? AFAIK, we should allow a few non-printing 
"joiner"/"non-joiner" characters to make it possible to use all Arabic 
words. But it would be great to consult with users/teachers of the 
languages.
Should Python run the bidi algorithm when parsing and disallow reordered 
tokens? Maybe optionally?