Re: [Python-ideas] Allow using symbols from Unicode block "Superscripts and Subscripts" in identifiers

4 May 2014

      On Sun, May 04, 2014 at 03:34:32AM +0900, Stephen J. Turnbull wrote:
...
Note that Unicode itself considers them *compatibility* characters and
says:
Superscripts and subscripts have been included in the Unicode
    Standard only to provide compatibility with existing character
    sets.  In general, the Unicode character encoding does not attempt
    to describe the positioning of a character above or below the
    baseline in typographical layout.
In other words, Unicode is reluctant to guarantee that x2, x², and x₂
are actually different identifiers!
[...]
I don't think this is a valid interpretation of what the Unicode 
standard is trying to say, but the point is moot. I think you've just 
identified (pun intended) a major objection to the proposal, one serious 
enough to change my mind from limited support to opposition.

Python identifiers are treated by their NFKC normalised form:

    All identifiers are converted into the normal form NFKC while 
    parsing; comparison of identifiers is based on NFKC.

https://docs.python.org/3/reference/lexical_analysis.html

And superscripts and subscripts normalise to standard characters:

py> [unicodedata.normalize('NFKC', s) for s in 'x² x₂ x2'.split()]
['x2', 'x2', 'x2']

So that categorically rules out allowing superscripts and subscripts as 
*distinct* characters in identifiers. So even if they were allowed, it 
would mean that x² and x₂ would be treated as the same identifier as x2.

For my use-case, I would want x² and x₂ to be treated as distinct 
identifiers, not just as a funny way of writing x2. So from my 
perspective, *at best* there is now insufficient benefit to bother 
allowing them.

It's actually stronger than that: allowing superscripts and subscripts 
would be an attractive nuisance for my use-case. If they were allowed, I 
would be tempted to write x² and x₂, which could end up being a subtle 
source of bugs if I accidentally used them both in the same namespace, 
thinking that they were distinct when they actually aren't. So I am now 
-1 on allowing superscripts and subscripts.

-- 
Steven

Re: [Python-ideas] Allow using symbols from Unicode block "Superscripts and Subscripts" in identifiers

Steven D'Aprano