![](https://secure.gravatar.com/avatar/0f393da4b8265592816178e5ff6c1c62.jpg?s=120&d=mm&r=g)
On Tue, Jul 19, 2016 at 7:21 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Jul 18, 2016 at 10:29:34PM -0700, Rustom Mody wrote:
There was this question on the python list a few days ago: Subject: SyntaxError: Non-ASCII character [...] I pointed out that the python2 error was more helpful (to my eyes) than python3s
And I pointed out how I thought the Python 3 error message could be improved, but the Python 2 error message was not very good.
Python3
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ariston/foo.py", line 31 wf = wave.open(“test.wav”, “rb”) ^ SyntaxError: invalid character in identifier
It would be much more helpful if the caret lined up with the offending character. Better still, if the offending character was actually stated:
wf = wave.open(“test.wav”, “rb”) ^ SyntaxError: invalid character '“' in identifier
Python2
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "foo.py", line 31 SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
As I pointed out earlier, this is less helpful. The line itself is not shown (although the line number is given), nor is the offending character. (Python 2 can't show the character because it doesn't know what it is -- it only knows the byte value, not the encoding.) But in the person's text editor, chances are they will see what looks to them like a perfectly reasonable character, and have no idea which is the byte \xe2.
IOW 1. The lexer is internally (evidently from the error message) so ASCII-oriented that any “unicode-junk” just defaults out to identifiers (presumably comments are dealt with earlier) and then if that lexing action fails it mistakenly pinpoints a wrong *identifier* rather than just an impermissible character like python 2
You seem to be jumping to a rather large conclusion here. Even if you are right that the lexer considers all otherwise-unexpected characters to be part of an identifier, why is that a problem?
It's a problem because those characters could never be part of an identifier. So it seems like a bug.
I agree that it is mildly misleading to say
invalid character '“' in identifier
when “ is not part of an identifier:
py> '“test'.isidentifier() False
but I don't think you can jump from that to your conclusion that Python's unicode support is somewhat "wrongheaded". Surely a much simpler, less inflammatory response would be to say that this one specific error message could be improved?
But... is it REALLY so bad? What if we wrote it like this instead:
py> result = my§function(arg) File "<stdin>", line 1 result = my§function(arg) ^ SyntaxError: invalid character in identifier
Isn't it more reasonable to consider that "my§function" looks like it is intended as an identifier, but it happens to have an illegal character in it?
combine that with 2. matrix mult (@) Ok to emulate perl but not to go outside ASCII
How does @ emulate Perl?
As for your second part, about not going outside of ASCII, yes, that is official policy for Python operators, keywords and builtins.
makes it seem (to me) python's unicode support is somewhat wrongheaded.
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/-gsjDSht8VU/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.