Apologies if this has already been discussed to death.
Python 3 allows Unicode characters in strings and identifiers but the actual quotation marks are only accepted in plain ASCII, i.e. the following all successfully initialise strings:
S1 = "Double Quoted" # Opened and closed with chr(34)0x22
S2 = 'Single Quoted' # Opened and closed with chr(39)0x27
But the following all result in an error – “SyntaxError: invalid character in identifier”:
S1 = “Double Quoted” # Opened with \u201c and closed with \u201d
S2 = ‘Single Quoted’ # Opened with \u2018 and closed with \u2019
To the experienced eye, and depending on the character font used, it is “obvious” what the problem is! The wrong quotation marks were used. The big problem, especially for beginners, is that the same keys were typed, just in the “wrong” editor or even the wrong editor mode or context I have found that in Outlook if the font is FixSys or I am replying to a plain text email it is fine but otherwise it is “helpful” – unfortunately, especially on Windows, “wrong” editors abound and include, but are not limited to, MS-Outlook, MS-Word, some online editing environments such as Quora.
On top of that is the helpful substitution of a m-hyphen for minus when you press space a word later so:
A = 3 – 2 # With a space syntax error due to \u2013
A = 3 - 2 # No Space or CR after I last typed it is OK as 0x2d
Use cases that catch people out:
I am sure that many us have encountered these issues or similar.
What can be done?
I would like to suggest that an incremental approach might be the best – clarifying the existing error message being the thing that should not break anything and either substituting for problem characters or processing them “properly” as a later enhancement.