[Python-ideas] allow `lambda' to be spelled λ
Steven D'Aprano
steve at pearwood.info
Tue Jul 19 07:20:27 EDT 2016
On Mon, Jul 18, 2016 at 10:29:34PM -0700, Rustom Mody wrote:
> There was this question on the python list a few days ago:
> Subject: SyntaxError: Non-ASCII character
[...]
> I pointed out that the python2 error was more helpful (to my eyes) than
> python3s
And I pointed out how I thought the Python 3 error message could be
improved, but the Python 2 error message was not very good.
> Python3
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/home/ariston/foo.py", line 31
> wf = wave.open(“test.wav”, “rb”)
> ^
> SyntaxError: invalid character in identifier
It would be much more helpful if the caret lined up with the offending
character. Better still, if the offending character was actually stated:
wf = wave.open(“test.wav”, “rb”)
^
SyntaxError: invalid character '“' in identifier
> Python2
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "foo.py", line 31
> SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but no
> encoding declared; see http://python.org/dev/peps/pep-0263/ for details
As I pointed out earlier, this is less helpful. The line itself is not
shown (although the line number is given), nor is the offending
character. (Python 2 can't show the character because it doesn't know
what it is -- it only knows the byte value, not the encoding.) But in
the person's text editor, chances are they will see what looks to them
like a perfectly reasonable character, and have no idea which is the
byte \xe2.
> IOW
> 1. The lexer is internally (evidently from the error message) so
> ASCII-oriented that any “unicode-junk” just defaults out to identifiers
> (presumably comments are dealt with earlier) and then if that lexing action
> fails it mistakenly pinpoints a wrong *identifier* rather than just an
> impermissible character like python 2
You seem to be jumping to a rather large conclusion here. Even if you
are right that the lexer considers all otherwise-unexpected characters
to be part of an identifier, why is that a problem?
I agree that it is mildly misleading to say
invalid character '“' in identifier
when “ is not part of an identifier:
py> '“test'.isidentifier()
False
but I don't think you can jump from that to your conclusion that
Python's unicode support is somewhat "wrongheaded". Surely a much
simpler, less inflammatory response would be to say that this one
specific error message could be improved?
But... is it REALLY so bad? What if we wrote it like this instead:
py> result = my§function(arg)
File "<stdin>", line 1
result = my§function(arg)
^
SyntaxError: invalid character in identifier
Isn't it more reasonable to consider that "my§function" looks like it is
intended as an identifier, but it happens to have an illegal character
in it?
> combine that with
> 2. matrix mult (@) Ok to emulate perl but not to go outside ASCII
How does @ emulate Perl?
As for your second part, about not going outside of ASCII, yes, that is
official policy for Python operators, keywords and builtins.
> makes it seem (to me) python's unicode support is somewhat wrongheaded.
--
Steve
More information about the Python-ideas
mailing list