[Python-ideas] allow `lambda' to be spelled λ

Rustom Mody rustompmody at gmail.com
Tue Jul 19 03:32:39 EDT 2016



On Tuesday, July 19, 2016 at 12:39:04 PM UTC+5:30, Neil Girdhar wrote:
>
> One solution would be to restrict identifiers to only Unicode characters 
> in appropriate classes.  The open quotation mark is in the code class for 
> punctuation, so it doesn't make sense to have it be part of an identifier.
>
> http://www.fileformat.info/info/unicode/category/index.htm
>

Python (3) is doing that alright as far as I can see:
https://docs.python.org/3/reference/lexical_analysis.html#identifiers

The point is that when it doesn’t fall in the classification(s) the error 
it raises suggests that the lexer is not really unicode-aware
 

>
>
> On Tuesday, July 19, 2016 at 1:29:35 AM UTC-4, Rustom Mody wrote:
>>
>> On Tuesday, July 19, 2016 at 10:20:29 AM UTC+5:30, Nick Coghlan wrote:
>>>
>>> On 18 July 2016 at 13:41, Rustom Mody <rusto... at gmail.com> wrote:
>>> > Do consider:
>>> >
>>> >>>> Α = 1
>>> >>>> A = 2
>>> >>>> Α + 1 == A
>>> > True
>>> >>>>
>>> >
>>> > Can (IMHO) go all the way to
>>> > https://en.wikipedia.org/wiki/IDN_homograph_attack
>>>
>>> Yes, we know - that dramatic increase in the attack surface is why
>>> PyPI is still ASCII only, even though full Unicode support is
>>> theoretically possible.
>>>
>>> It's not a major concern once an attacker already has you running
>>> arbitrary code on your system though, as the main problem there is
>>> that they're *running arbitrary code on your system*. , That means the
>>> usability gains easily outweigh the increased obfuscation potential,
>>> as worrying about confusable attacks at that point is like worrying
>>> about a dripping tap upstairs when the Brisbane River is already
>>> flowing through the ground floor of your house :)
>>>
>>> Cheers,
>>>
>>>
>> There was this question on the python list a few days ago:
>> Subject: SyntaxError: Non-ASCII character
>>
>> Chris Angelico pointed out the offending line:
>> wf = wave.open(“test.wav”, “rb”)
>> (should be wf = wave.open("test.wav", "rb") instead)
>>
>> Since he also said:
>> > The solution may be as simple as running "python3 script.py" rather 
>> than "python script.py".
>>
>> I pointed out that the python2 error was more helpful (to my eyes) than 
>> python3s
>>
>>
>> Python3 
>>
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File "/home/ariston/foo.py", line 31
>>     wf = wave.open(“test.wav”, “rb”)
>>                        ^
>> SyntaxError: invalid character in identifier
>>
>> Python2
>>
>>
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File "foo.py", line 31
>> SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but no 
>> encoding declared; see http://python.org/dev/peps/pep-0263/ for details 
>>
>> IOW
>> 1. The lexer is internally (evidently from the error message) so 
>> ASCII-oriented that any “unicode-junk” just defaults out to identifiers 
>> (presumably comments are dealt with earlier) and then if that lexing action 
>> fails it mistakenly pinpoints a wrong *identifier* rather than just an 
>> impermissible character like python 2
>> combine that with
>> 2. matrix mult (@) Ok to emulate perl but not to go outside ASCII
>>
>> makes it seem  (to me) python's unicode support is somewhat wrongheaded.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160719/6c9866dd/attachment-0001.html>


More information about the Python-ideas mailing list