On Sun, May 10, 2020 at 02:13:37PM -0400, Richard Damon wrote:
A lot of this reminds me of a story told by a programming instructor in the 70's, he submitted a FORTRAN program deck to the machine, the complier gave him a warning on a statement which read INTEGER misspelled, it than ran the program, but IGNORED the statement, even though it clearly understood what he meant,
How did the compiler understand what he meant? If INTEGER was misspelled, how is the compiler supposed to know that ITEGER or INREGER or whatever misspelling he used was actually supposed to mean INTEGER?
and got wrong answers because the compiler just used the default REAL type for the variable, which took him a while to figure out what the error was.
Yes, programming was harder in the 1970s. The tooling was limited and inconvenient.
An error like character (whatever) is not a quote (or is not a minus sign) seems similar. It is one thing to not recognize a funny character in the language, but to actually parse it well enough to give a message that says in effect, that may look like a quote to you, but I am not going to treat is as one, sounds perverse in the language.
It might *sound* perverse, but what is genuinely perverse is Do What I Mean systems that try to *guess* what you mean rather than allowing the user to correct their own mistake. Only the user truly knows what they intended. http://www.catb.org/jargon/html/D/DWIM.html DWIM will just train beginners to be lazy, sloppy, thoughtless coders, since "the interpreter knows what I meant" -- until it doesn't. Even if the DWIM gets it right 9 times out of 10, the pain and difficulty in that remaining case will outweigh the convenience of the other 9 times.
If we are going to go to the effort to detect that particular character, it makes more sense to make it actually DO the obvious thing.
Is `x‒y` meant to be an identifier with a hyphen, or the subtraction x−y? How about `x‐y` or `x‑y` or `x–y` or `xーy`? (All of the above are distinct Unicode dash-like characters. Only one of them is an actual minus sign.) In 2020 we don't have to wait two weeks for our next share of computer time. Correcting an error and re-running the code is easy, there is no real advantage in having the interpreter try to guess what the user probably meant to write, instead of running what they actually wrote and failing if it is not legal code.
If not, the the current error seems fine, especially if we could include more details. An 'invalid character' message, that doesn't tell you WHICH character is invalid seems like it is holding back, If it included the bad character, or pointed to it, then the error becomes a lot more clear.
The SyntaxError already points at, or just after, the invalid character. py> x−y File "<stdin>", line 1 x−y ^ SyntaxError: invalid character in identifier -- Steven