[Python-ideas] Allow using symbols from Unicode block "Superscripts and Subscripts" in identifiers

Sat May 3 23:48:51 CEST 2014

On 5/3/2014 12:50 AM, Steven D'Aprano wrote:
> On Fri, May 02, 2014 at 10:27:56PM -0400, Terry Reedy wrote:
>
>> If the rules for identifiers are expanded, any code the uses newly
>> allowed names cannot be backported or run on previous versions. In
>> contracted, the opposite problem occurs. I do not think they should be
>> changed either way without a strong cause.
>
> That applies to any new feature -- code using that feature cannot be
> easily backported. In this case, it's actually quite simple to backport
> code using the new rules for identifiers: just change the identifiers.
> The algorithm used by the code remains that same.

It appears that I consider lexicography more 'fundamental' in some sense 
than you do. But lets skip over this.

>>>  From 2.3. Identifiers and keywords
>>> "The syntax of identifiers in Python is based on the Unicode standard
>>> annex UAX-31, with elaboration and changes as defined below; see also
>>> PEP 3131 for further details."

Without reading the annex, I cannot tell which part of the 'below' 
actually defines a 'change', as opposed to an 'elaboration' 
(explanation). I have no idea whether the unknown changes are additions, 
deletions, or merely selections of options.

>> In other words, we use the standard with a few intentional
>> modifications.
>
> Playing Devil's Advocate, perhaps we could add a few more intentional
> modifications.

Or perhaps not, depending on what the modifications actually are and 
what the reasons were.

> While there are advantages to following a standard just for the sake of
> following a standard, once you allow any changes, you're no longer
> following the standard. So the argument becomes, why should we allow
> that change but not this change?

Nick recently argued, very similarly, that having restored string 'u' 
prefixes was a reason to restore dict.iterxyz methods. You agreed with 
me that there were good reasons why B did not follow from A.

To properly compare current and proposed changes, we must know the 
current 'modifications and changes', their reasons and effects, and the 
proposed changes and their reasons (any real parallels) and likely 
effects. If you were to do the research, I would be willing to discuss.

> Particularly for mathematically-focused code, I think it would be useful
> to be able to use identifiers like (say) σ² for variance, g₁ for sample
> skewness, or β₂ for Pearson's skewness, to give a few real-world
> examples. Regular digits may be ambiguous: compare s₁² for the sample
> variance with Bessel's correction, versus s12. (s twelve?)

I agree that there are good uses for this restricted set of additions. 
Would you allow super/subscripts as prefixes rather than suffixes? I 
presume not since we already disallow initial numbers.

> I'm going to give a tentative +1 vote to allowing superscript and
> subscript letters and digits in identifiers, if it can be done without
> excessive cost in complexity or performance.

Would you consider doubling the cost of checking each character (a 
reasonable estimate, I think) excessive or not?

 > Anything else, like (say) ⑤ (CIRCLED DIGIT FIVE),
 > I will give a firm -1.

-- 
Terry Jan Reedy