[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

11 May 2020

      10.05.20 10:09, Steve Barnes пише:
...
4. Start accepting hyphens as minus & Unicode quotation marks – this
    would be the ideal answer for pasted code but has a lot of possible
    things to iron out such as do we require that the quotes match and
    are in the typographically correct order. It is also quite a big &
    complex change to the python interpreter.
Two consequent hyphens can look as a dash, and can be replaced with a 
dash by "typographer", but they have different meaning that a single minus.
...
5. Normalise the input to the python interpreter (at least for these
    characters and possibly a few others) so that entering or reading
    from a file S1 = “Double Quoted” becomesS1 = "Double Quoted", etc. –
    this should be a easier change to the interpreter but, from a purist
    point of view, could be said to make us as bad as the others because
    we are not honouring what the user entered.
It is ambiguous. For example, in Ukraine we use pairs of quotation marks 
« and » or „ and “. But “ is used as an opening quotation mark in 
English, and » and « are used with opposite meaning in Swedish. Single 
low-9 quotation mark ‚ can be confused with a comma, single angle 
quotation marks ‹ and ❮ can be confused with <.
...
6. Change the error message “SyntaxError: invalid character in
    identifier” to include which character and it’s Unicode value so
    that it becomes  “SyntaxError: invalid character 0x201c “  in
    identifier” – this is almost certainly the easiest change and fits
    well with explicit is better than implicit but still leaves it to
    the user to correct the erroneous input (which could be argued is
    both good and bad).
https://bugs.python.org/issue40593

Also, "in identifier" is incorrect in most cases, because the invalid 
character does not look like a part of identifier in most cases.

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

Serhiy Storchaka