![](https://secure.gravatar.com/avatar/ae477ee7167a8775fe92ab921ffd7a42.jpg?s=120&d=mm&r=g)
On Tuesday, July 19, 2016 at 10:20:29 AM UTC+5:30, Nick Coghlan wrote:
On 18 July 2016 at 13:41, Rustom Mody <rusto...@gmail.com <javascript:>> wrote:
Do consider:
Α = 1 A = 2 Α + 1 == A True
Can (IMHO) go all the way to https://en.wikipedia.org/wiki/IDN_homograph_attack
Yes, we know - that dramatic increase in the attack surface is why PyPI is still ASCII only, even though full Unicode support is theoretically possible.
It's not a major concern once an attacker already has you running arbitrary code on your system though, as the main problem there is that they're *running arbitrary code on your system*. , That means the usability gains easily outweigh the increased obfuscation potential, as worrying about confusable attacks at that point is like worrying about a dripping tap upstairs when the Brisbane River is already flowing through the ground floor of your house :)
Cheers,
There was this question on the python list a few days ago: Subject: SyntaxError: Non-ASCII character Chris Angelico pointed out the offending line: wf = wave.open(“test.wav”, “rb”) (should be wf = wave.open("test.wav", "rb") instead) Since he also said:
The solution may be as simple as running "python3 script.py" rather than "python script.py".
I pointed out that the python2 error was more helpful (to my eyes) than python3s Python3 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ariston/foo.py", line 31 wf = wave.open(“test.wav”, “rb”) ^ SyntaxError: invalid character in identifier Python2 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "foo.py", line 31 SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details IOW 1. The lexer is internally (evidently from the error message) so ASCII-oriented that any “unicode-junk” just defaults out to identifiers (presumably comments are dealt with earlier) and then if that lexing action fails it mistakenly pinpoints a wrong *identifier* rather than just an impermissible character like python 2 combine that with 2. matrix mult (@) Ok to emulate perl but not to go outside ASCII makes it seem (to me) python's unicode support is somewhat wrongheaded.