[Python-ideas] allow `lambda' to be spelled λ

Neil Girdhar mistersheik at gmail.com
Tue Jul 19 03:09:03 EDT 2016


One solution would be to restrict identifiers to only Unicode characters in 
appropriate classes.  The open quotation mark is in the code class for 
punctuation, so it doesn't make sense to have it be part of an identifier.

http://www.fileformat.info/info/unicode/category/index.htm

On Tuesday, July 19, 2016 at 1:29:35 AM UTC-4, Rustom Mody wrote:
>
> On Tuesday, July 19, 2016 at 10:20:29 AM UTC+5:30, Nick Coghlan wrote:
>>
>> On 18 July 2016 at 13:41, Rustom Mody <rusto... at gmail.com> wrote:
>> > Do consider:
>> >
>> >>>> Α = 1
>> >>>> A = 2
>> >>>> Α + 1 == A
>> > True
>> >>>>
>> >
>> > Can (IMHO) go all the way to
>> > https://en.wikipedia.org/wiki/IDN_homograph_attack
>>
>> Yes, we know - that dramatic increase in the attack surface is why
>> PyPI is still ASCII only, even though full Unicode support is
>> theoretically possible.
>>
>> It's not a major concern once an attacker already has you running
>> arbitrary code on your system though, as the main problem there is
>> that they're *running arbitrary code on your system*. , That means the
>> usability gains easily outweigh the increased obfuscation potential,
>> as worrying about confusable attacks at that point is like worrying
>> about a dripping tap upstairs when the Brisbane River is already
>> flowing through the ground floor of your house :)
>>
>> Cheers,
>>
>>
> There was this question on the python list a few days ago:
> Subject: SyntaxError: Non-ASCII character
>
> Chris Angelico pointed out the offending line:
> wf = wave.open(“test.wav”, “rb”)
> (should be wf = wave.open("test.wav", "rb") instead)
>
> Since he also said:
> > The solution may be as simple as running "python3 script.py" rather than 
> "python script.py".
>
> I pointed out that the python2 error was more helpful (to my eyes) than 
> python3s
>
>
> Python3 
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/ariston/foo.py", line 31
>     wf = wave.open(“test.wav”, “rb”)
>                        ^
> SyntaxError: invalid character in identifier
>
> Python2
>
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "foo.py", line 31
> SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but no 
> encoding declared; see http://python.org/dev/peps/pep-0263/ for details 
>
> IOW
> 1. The lexer is internally (evidently from the error message) so 
> ASCII-oriented that any “unicode-junk” just defaults out to identifiers 
> (presumably comments are dealt with earlier) and then if that lexing action 
> fails it mistakenly pinpoints a wrong *identifier* rather than just an 
> impermissible character like python 2
> combine that with
> 2. matrix mult (@) Ok to emulate perl but not to go outside ASCII
>
> makes it seem  (to me) python's unicode support is somewhat wrongheaded.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160719/b770391f/attachment.html>


More information about the Python-ideas mailing list