On Sun, May 10, 2020 at 07:09:15AM +0000, Steve Barnes wrote about Unicode dashes and quotes sneaking into code:
What can be done?
1. Persuade Microsoft, and others, to stop being so helpful by default - good luck with that!
No, I think that in the broader picture, they are doing the correct thing by using nicer typographical quotes and dashes for non-source code. Even if they would listen, we should not ask :-)
2. Tell all users that they need to use a "proper" editor or IDE - This seems like adding an additional barrier to new & casual users.
When people decide to learn, say, wood working, or carpentry, and try to make holes in timber by gauging the wood with a screwdriver^1 but are told to get themselves a drill instead, is this seen as "an additional barrier" or just part of the process of learning a new skill set? A cheap drill costs about AUD$50 and another $25 for a set of drill bits. A cheap IDE or programmers editor costs nothing but a bit of time and hard disk space. I think we can expect would-be programmers to *not* use MS Word to write Python code. If they aren't willing to invest the time and energy to install, then they probably won't invest the time and energy to learn how to program either.
3. Better yet tell them to use a "proper" OS like .... - At the very least many of us have to use Windows at work.
It's perfectly possible to write code on Windows without paying lots of money for expensive commercial IDEs.
4. Start accepting hyphens as minus & Unicode quotation marks - this would be the ideal answer for pasted code but has a lot of possible things to iron out such as do we require that the quotes match and are in the typographically correct order. It is also quite a big & complex change to the python interpreter.
Python already accepts hyphens as minus -- only *one* kind though, the so-called ASCII "HYPHEN-MINUS". What it doesn't accept is actual minus signs, '\N{MINUS SIGN}', as minus signs. I don't mind seeing rich unicode in strings, or even comments, but I wouldn't (yet!) want to see it in executable code, I don't think the state of the art of editing tooling and font support is quite ready for it yet. I still see far too many "Missing Character Glyphs" and supposed monospaced text where there's always *one* character that is a single pixel short of the consistent spacing. And I still don't know how to type − in my editor, I have to copy and paste it from elsewhere.
5. Normalise the input to the python interpreter (at least for these characters and possibly a few others) so that entering or reading from a file S1 = "Double Quoted" becomes S1 = "Double Quoted", etc. - this should be a easier change to the interpreter but, from a purist point of view, could be said to make us as bad as the others because we are not honouring what the user entered.
I think people should experiment with preprocessors to get a feel for how well they work before moving it into the interpreter. I think that David(?) may have a Vim or Emacs mode that allows him to use Unicode chars as syntax?
6. Change the error message "SyntaxError: invalid character in identifier" to include which character and it's Unicode value so that it becomes "SyntaxError: invalid character 0x201c " in identifier" - this is almost certainly the easiest change and fits well with explicit is better than implicit but still leaves it to the user to correct the erroneous input (which could be argued is both good and bad).
More informative error messages are good :-) ^1 I have literally done that, when I was too lazy to go into the garage and get the drill. So I stabbed at the timber enough to make an indentation so the screw would bite. It actually works! Just not well. -- Steven