Re: [Python-ideas] Allow using symbols from Unicode block "Superscripts and Subscripts" in identifiers

4 May 2014

      On Sunday 04 May 2014 12:40:44 Steven D'Aprano wrote:
...
On Sun, May 04, 2014 at 03:34:32AM +0900, Stephen J. Turnbull wrote:
...
Note that Unicode itself considers them *compatibility* characters and
says:
Superscripts and subscripts have been included in the Unicode
    Standard only to provide compatibility with existing character
    sets.  In general, the Unicode character encoding does not attempt
    to describe the positioning of a character above or below the
    baseline in typographical layout.
In other words, Unicode is reluctant to guarantee that x2, x², and x₂
are actually different identifiers!
[...]
I don't think this is a valid interpretation of what the Unicode 
standard is trying to say, but the point is moot. I think you've just 
identified (pun intended) a major objection to the proposal, one serious 
enough to change my mind from limited support to opposition.
Python identifiers are treated by their NFKC normalised form:
All identifiers are converted into the normal form NFKC while 
    parsing; comparison of identifiers is based on NFKC.
https://docs.python.org/3/reference/lexical_analysis.html
And superscripts and subscripts normalise to standard characters:
py> [unicodedata.normalize('NFKC', s) for s in 'x² x₂ x2'.split()]
['x2', 'x2', 'x2']
So that categorically rules out allowing superscripts and subscripts as 
*distinct* characters in identifiers. So even if they were allowed, it 
would mean that x² and x₂ would be treated as the same identifier as x2.
For my use-case, I would want x² and x₂ to be treated as distinct 
identifiers, not just as a funny way of writing x2. So from my 
perspective, *at best* there is now insufficient benefit to bother 
allowing them.
It's actually stronger than that: allowing superscripts and subscripts 
would be an attractive nuisance for my use-case. If they were allowed, I 
would be tempted to write x² and x₂, which could end up being a subtle 
source of bugs if I accidentally used them both in the same namespace, 
thinking that they were distinct when they actually aren't. So I am now 
-1 on allowing superscripts and subscripts.
That's the strongest point against allowing superscripts and subscripts in a whole discussion, IMHO. I would want x² and x₂ to be treated as distinct identifiers either.

I've tried this use case in Julia and it works:
julia> x₂ = 1
1

julia> x² = 2
2

julia> x₂
1

julia> x²
2

But then I've found thread in Julia's bugtracker covering unicode identifiers normalization[1]. As I understood they don't use NFKC. As a consequence symbols "μ" (0x00b5) and "µ" (0x03bc) are treated as different. They understood that it's weird and they need to do something about this. Some of they don't want to use NFKC because of the same reason (+ for example, "H" and "ℍ" would became equal identifiers). Others decided to give a warning when new identifier is equal to the defined one (in the terms of NFKC normalization).

Now I understood that things are more complicated that I considered them when I did a proposal. I think that there is no "good way" to add support for subscripts and superscripts. So it's better to leave the situation as is.

-- 
Regards, Roman Inflianskas

--------
[1] covering unicode identifiers normalization