[Python-ideas] allow `lambda' to be spelled λ

Steven D'Aprano steve at pearwood.info
Tue Jul 19 07:20:27 EDT 2016

On Mon, Jul 18, 2016 at 10:29:34PM -0700, Rustom Mody wrote:

> There was this question on the python list a few days ago:
> Subject: SyntaxError: Non-ASCII character
> I pointed out that the python2 error was more helpful (to my eyes) than 
> python3s

And I pointed out how I thought the Python 3 error message could be 
improved, but the Python 2 error message was not very good.

> Python3 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/ariston/foo.py", line 31
>     wf = wave.open(“test.wav”, “rb”)
>                        ^
> SyntaxError: invalid character in identifier

It would be much more helpful if the caret lined up with the offending 
character. Better still, if the offending character was actually stated:

    wf = wave.open(“test.wav”, “rb”)
SyntaxError: invalid character '“' in identifier

> Python2
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "foo.py", line 31
> SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but no 
> encoding declared; see http://python.org/dev/peps/pep-0263/ for details 

As I pointed out earlier, this is less helpful. The line itself is not 
shown (although the line number is given), nor is the offending 
character. (Python 2 can't show the character because it doesn't know 
what it is -- it only knows the byte value, not the encoding.) But in 
the person's text editor, chances are they will see what looks to them 
like a perfectly reasonable character, and have no idea which is the 
byte \xe2.

> 1. The lexer is internally (evidently from the error message) so 
> ASCII-oriented that any “unicode-junk” just defaults out to identifiers 
> (presumably comments are dealt with earlier) and then if that lexing action 
> fails it mistakenly pinpoints a wrong *identifier* rather than just an 
> impermissible character like python 2

You seem to be jumping to a rather large conclusion here. Even if you 
are right that the lexer considers all otherwise-unexpected characters 
to be part of an identifier, why is that a problem?

I agree that it is mildly misleading to say 

invalid character '“' in identifier

when “ is not part of an identifier:

py> '“test'.isidentifier()

but I don't think you can jump from that to your conclusion that 
Python's unicode support is somewhat "wrongheaded". Surely a much 
simpler, less inflammatory response would be to say that this one 
specific error message could be improved?

But... is it REALLY so bad? What if we wrote it like this instead:

py> result = my§function(arg)
  File "<stdin>", line 1
    result = my§function(arg)
SyntaxError: invalid character in identifier

Isn't it more reasonable to consider that "my§function" looks like it is 
intended as an identifier, but it happens to have an illegal character 
in it?

> combine that with
> 2. matrix mult (@) Ok to emulate perl but not to go outside ASCII

How does @ emulate Perl?

As for your second part, about not going outside of ASCII, yes, that is 
official policy for Python operators, keywords and builtins.

> makes it seem  (to me) python's unicode support is somewhat wrongheaded.


More information about the Python-ideas mailing list