On Sun, May 04, 2014 at 03:34:32AM +0900, Stephen J. Turnbull wrote:
Note that Unicode itself considers them *compatibility* characters and says:
Superscripts and subscripts have been included in the Unicode Standard only to provide compatibility with existing character sets. In general, the Unicode character encoding does not attempt to describe the positioning of a character above or below the baseline in typographical layout.
In other words, Unicode is reluctant to guarantee that x2, x², and x₂ are actually different identifiers!
I don't think this is a valid interpretation of what the Unicode standard is trying to say, but the point is moot. I think you've just identified (pun intended) a major objection to the proposal, one serious enough to change my mind from limited support to opposition.
Python identifiers are treated by their NFKC normalised form:
All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.
And superscripts and subscripts normalise to standard characters:
py> [unicodedata.normalize('NFKC', s) for s in 'x² x₂ x2'.split()] ['x2', 'x2', 'x2']
So that categorically rules out allowing superscripts and subscripts as *distinct* characters in identifiers. So even if they were allowed, it would mean that x² and x₂ would be treated as the same identifier as x2.
For my use-case, I would want x² and x₂ to be treated as distinct identifiers, not just as a funny way of writing x2. So from my perspective, *at best* there is now insufficient benefit to bother allowing them.
It's actually stronger than that: allowing superscripts and subscripts would be an attractive nuisance for my use-case. If they were allowed, I would be tempted to write x² and x₂, which could end up being a subtle source of bugs if I accidentally used them both in the same namespace, thinking that they were distinct when they actually aren't. So I am now -1 on allowing superscripts and subscripts.