On Tuesday, July 19, 2016 at 12:39:04 PM UTC+5:30, Neil Girdhar wrote:
One solution would be to restrict identifiers to only Unicode characters in appropriate classes.  The open quotation mark is in the code class for punctuation, so it doesn't make sense to have it be part of an identifier.


Python (3) is doing that alright as far as I can see:

The point is that when it doesn’t fall in the classification(s) the error it raises suggests that the lexer is not really unicode-aware

On Tuesday, July 19, 2016 at 1:29:35 AM UTC-4, Rustom Mody wrote:
On Tuesday, July 19, 2016 at 10:20:29 AM UTC+5:30, Nick Coghlan wrote:
On 18 July 2016 at 13:41, Rustom Mody <rusto...@gmail.com> wrote:
> Do consider:
>>>> Α = 1
>>>> A = 2
>>>> Α + 1 == A
> True
> Can (IMHO) go all the way to
> https://en.wikipedia.org/wiki/IDN_homograph_attack

Yes, we know - that dramatic increase in the attack surface is why
PyPI is still ASCII only, even though full Unicode support is
theoretically possible.

It's not a major concern once an attacker already has you running
arbitrary code on your system though, as the main problem there is
that they're *running arbitrary code on your system*. , That means the
usability gains easily outweigh the increased obfuscation potential,
as worrying about confusable attacks at that point is like worrying
about a dripping tap upstairs when the Brisbane River is already
flowing through the ground floor of your house :)


There was this question on the python list a few days ago:
Subject: SyntaxError: Non-ASCII character

Chris Angelico pointed out the offending line:
wf = wave.open(“test.wav”, “rb”)
(should be wf = wave.open("test.wav", "rb") instead)

Since he also said:
> The solution may be as simple as running "python3 script.py" rather than "python script.py".

I pointed out that the python2 error was more helpful (to my eyes) than python3s


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ariston/foo.py", line 31
    wf = wave.open(“test.wav”, “rb”)
SyntaxError: invalid character in identifier


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "foo.py", line 31
SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

1. The lexer is internally (evidently from the error message) so ASCII-oriented that any “unicode-junk” just defaults out to identifiers (presumably comments are dealt with earlier) and then if that lexing action fails it mistakenly pinpoints a wrong *identifier* rather than just an impermissible character like python 2
combine that with
2. matrix mult (@) Ok to emulate perl but not to go outside ASCII

makes it seem  (to me) python's unicode support is somewhat wrongheaded.