Nick Coghlan writes:
The reason that can help is that the main problem with "improving" error messages, is that it can be really hard to tell whether the improvements are actually improvements or not
Personally, I think the real issue here is that the curly quote (and things like mathematical PRIME character) are easily confused with Python syntax and it all looks like grit on Tim's monitor. I tried substituting an emoticon and the DOUBLE INTEGRAL, and it was quite obvious what was wrong from the Python 3 error message.<wink/> However, in this case, as far as I can tell from the error messages induced by playing with ASCII, Python 3.5 thinks that all non- identifier ASCII characters are syntactic (so for example it says that with open($file.txt") as f: is "invalid syntax"). But for non-ASCII characters (I guess including the Latin 1 set?) they are either letters, numerals, or just plain not valid in a Python program AIUI (outside of strings and comments, of course). I would think the lexer could just treat each invalid character as an invalid_token, which is always invalid in Python syntax, and the error would be a SyntaxError with the message formatted something like "invalid character {} = U+{:04X}".format(ch, ord(ch)) This should avoid the strange placement of the position indicator, too. If someday we decide to use an non-ASCII character for a syntactic purpose, that's a big enough compatibility break in itself that changing the invalid character set (and thus the definition of invalid_token) is insignificant. I'm pretty sure this is what a couple of earlier posters have in mind, too.