One solution would be to restrict identifiers to only Unicode characters in appropriate classes. The open quotation mark is in the code class for punctuation, so it doesn't make sense to have it be part of an identifier. http://www.fileformat.info/info/unicode/category/index.htm On Tuesday, July 19, 2016 at 1:29:35 AM UTC-4, Rustom Mody wrote:
On Tuesday, July 19, 2016 at 10:20:29 AM UTC+5:30, Nick Coghlan wrote:
On 18 July 2016 at 13:41, Rustom Mody <rusto...@gmail.com> wrote:
Do consider:
Α = 1 A = 2 Α + 1 == A True
Can (IMHO) go all the way to https://en.wikipedia.org/wiki/IDN_homograph_attack
Yes, we know - that dramatic increase in the attack surface is why PyPI is still ASCII only, even though full Unicode support is theoretically possible.
It's not a major concern once an attacker already has you running arbitrary code on your system though, as the main problem there is that they're *running arbitrary code on your system*. , That means the usability gains easily outweigh the increased obfuscation potential, as worrying about confusable attacks at that point is like worrying about a dripping tap upstairs when the Brisbane River is already flowing through the ground floor of your house :)
Cheers,
There was this question on the python list a few days ago: Subject: SyntaxError: Non-ASCII character
Chris Angelico pointed out the offending line: wf = wave.open(“test.wav”, “rb”) (should be wf = wave.open("test.wav", "rb") instead)
Since he also said:
The solution may be as simple as running "python3 script.py" rather than "python script.py".
I pointed out that the python2 error was more helpful (to my eyes) than python3s
Python3
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ariston/foo.py", line 31 wf = wave.open(“test.wav”, “rb”) ^ SyntaxError: invalid character in identifier
Python2
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "foo.py", line 31 SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
IOW 1. The lexer is internally (evidently from the error message) so ASCII-oriented that any “unicode-junk” just defaults out to identifiers (presumably comments are dealt with earlier) and then if that lexing action fails it mistakenly pinpoints a wrong *identifier* rather than just an impermissible character like python 2 combine that with 2. matrix mult (@) Ok to emulate perl but not to go outside ASCII
makes it seem (to me) python's unicode support is somewhat wrongheaded.