[Python-ideas] Allow using symbols from Unicode block "Superscripts and Subscripts" in identifiers

Sat May 3 19:57:03 CEST 2014

On Sat, May 03, 2014 at 11:39:23AM -0400, Ron Adam wrote:
> 
> 
> On 05/03/2014 05:05 AM, Steven D'Aprano wrote:
> >On Sat, May 03, 2014 at 06:38:21PM +1200, Greg Ewing wrote:
> 
> >>>Steven D'Aprano wrote:
> 
> >>>> >Particularly for mathematically-focused code, I think it would be 
> >>>useful
> >>>> >to be able to use identifiers like (say) σ² for variance,
> 
> >>>Having σ² be a variable name could be confusing. To a
> >>>mathematician, it's not a distinct variable, it's
> >>>just σ ** 2.
> 
> >Actually, not really. A better way of putting it is that the standard
> >deviation is "just" the square root of σ². Variance comes first (it's
> >defined from first principles), and then the standard deviation is
> >defined by taking the square root.
> 
> 
> The main problem I see is that many possible questions come to mind rather 
> than one simple or obvious interpretation.

If I name a variable "x2", what is the "one simple or obvious 
interpretation" that such an identifier presumably has? If standard, 
ASCII-only identifiers don't have a single interpretation, why should 
identifiers like σ² be held to that requirement?

Like any other identifier, one needs to interpret the name in context. 
Identifiers can be idiomatic ("i" for a loop variable, "c" for a 
character), more or less descriptive ("number_of_pages", "npages"), or 
obfuscated ("e382702"). They can be written in English, or in some other 
language. They can be ordinary words, or jargon that only means 
something to those who understand the problem domain. None of this will 
be different if sub/superscript digits and letters are allowed.

One of the frustrations on this list is how often people hold new 
proposals to higher standard than existing features. Particularly 
*impossible* standards. It simply isn't possible for characters like 
superscript-two to be given a *single* interpretation (although there is 
an obvious one, namely "squared") any more than it is possible for the 
letter "a" to be given a *single* interpretation.

There are valid objections to this proposal. It may be that the effort 
needed to allow code points like ² in identifiers without also allowing 
½ or ② may be too great. Or the performance cost is too high. Or the 
benefit for mathematical-style code doesn't justify adding additional 
language complexity.

Or even a purely aethetic judgement "I just don't like it". (I don't 
like identifiers written in cyrillic, because I can't read them, but I'm 
not the target audience for such identifiers and I will never need to 
read them. Consequently I don't object if other people use cyrillic 
identifiers in their personal code.)

Holding this proposal up to an impossible standard which plain ASCII 
identifiers don't even meet is simply not cricket.

Thank you all for letting me get that off my chest, and apologies to Ron 
for singling him out.

-- 
Steven