[Python-ideas] Allow using symbols from Unicode block "Superscripts and Subscripts" in identifiers

Steven D'Aprano steve at pearwood.info
Sun May 4 04:40:44 CEST 2014


On Sun, May 04, 2014 at 03:34:32AM +0900, Stephen J. Turnbull wrote:

> Note that Unicode itself considers them *compatibility* characters and
> says:
> 
>     Superscripts and subscripts have been included in the Unicode
>     Standard only to provide compatibility with existing character
>     sets.  In general, the Unicode character encoding does not attempt
>     to describe the positioning of a character above or below the
>     baseline in typographical layout.
> 
> In other words, Unicode is reluctant to guarantee that x2, x², and x₂
> are actually different identifiers!
[...]

I don't think this is a valid interpretation of what the Unicode 
standard is trying to say, but the point is moot. I think you've just 
identified (pun intended) a major objection to the proposal, one serious 
enough to change my mind from limited support to opposition.

Python identifiers are treated by their NFKC normalised form:

    All identifiers are converted into the normal form NFKC while 
    parsing; comparison of identifiers is based on NFKC.

https://docs.python.org/3/reference/lexical_analysis.html

And superscripts and subscripts normalise to standard characters:

py> [unicodedata.normalize('NFKC', s) for s in 'x² x₂ x2'.split()]
['x2', 'x2', 'x2']

So that categorically rules out allowing superscripts and subscripts as 
*distinct* characters in identifiers. So even if they were allowed, it 
would mean that x² and x₂ would be treated as the same identifier as x2.

For my use-case, I would want x² and x₂ to be treated as distinct 
identifiers, not just as a funny way of writing x2. So from my 
perspective, *at best* there is now insufficient benefit to bother 
allowing them.

It's actually stronger than that: allowing superscripts and subscripts 
would be an attractive nuisance for my use-case. If they were allowed, I 
would be tempted to write x² and x₂, which could end up being a subtle 
source of bugs if I accidentally used them both in the same namespace, 
thinking that they were distinct when they actually aren't. So I am now 
-1 on allowing superscripts and subscripts.


-- 
Steven


More information about the Python-ideas mailing list