[Python-ideas] Allow using symbols from Unicode block "Superscripts and Subscripts" in identifiers
Steven D'Aprano
steve at pearwood.info
Sun May 4 04:40:44 CEST 2014
On Sun, May 04, 2014 at 03:34:32AM +0900, Stephen J. Turnbull wrote:
> Note that Unicode itself considers them *compatibility* characters and
> says:
>
> Superscripts and subscripts have been included in the Unicode
> Standard only to provide compatibility with existing character
> sets. In general, the Unicode character encoding does not attempt
> to describe the positioning of a character above or below the
> baseline in typographical layout.
>
> In other words, Unicode is reluctant to guarantee that x2, x², and x₂
> are actually different identifiers!
[...]
I don't think this is a valid interpretation of what the Unicode
standard is trying to say, but the point is moot. I think you've just
identified (pun intended) a major objection to the proposal, one serious
enough to change my mind from limited support to opposition.
Python identifiers are treated by their NFKC normalised form:
All identifiers are converted into the normal form NFKC while
parsing; comparison of identifiers is based on NFKC.
https://docs.python.org/3/reference/lexical_analysis.html
And superscripts and subscripts normalise to standard characters:
py> [unicodedata.normalize('NFKC', s) for s in 'x² x₂ x2'.split()]
['x2', 'x2', 'x2']
So that categorically rules out allowing superscripts and subscripts as
*distinct* characters in identifiers. So even if they were allowed, it
would mean that x² and x₂ would be treated as the same identifier as x2.
For my use-case, I would want x² and x₂ to be treated as distinct
identifiers, not just as a funny way of writing x2. So from my
perspective, *at best* there is now insufficient benefit to bother
allowing them.
It's actually stronger than that: allowing superscripts and subscripts
would be an attractive nuisance for my use-case. If they were allowed, I
would be tempted to write x² and x₂, which could end up being a subtle
source of bugs if I accidentally used them both in the same namespace,
thinking that they were distinct when they actually aren't. So I am now
-1 on allowing superscripts and subscripts.
--
Steven
More information about the Python-ideas
mailing list