Allow using symbols from Unicode block "Superscripts and Subscripts" in identifiers
It's really useful that python 3 allows me to use some Unicode symbols (as specified in https://docs.python.org/3.4/reference/lexical_analysis.html#identifiers[1]), especially Greek symbols for mathematical programs. But when I write mathematical program with lots of indices I would like to use symbols from block "Superscripts and Subscripts" (as id_continue), for example: ⁴₂₍₎ I don't see any problems with allowing yet another subset of Unicode symbols. In Julia, for example, I can use them without problems. -- Regards, Roman Inflianskas -------- [1] https://docs.python.org/3.4/reference/lexical_analysis.html#identifiers
This block includes non-alphanumeric characters. You wouldn't want to allow variables named x⁺¹ (~ x+1) Some of the characters in this block are already allowed (the letters in category Lm). The characters you want are in the No (other numbers) category. Unfortunately, adding that category would be problematic as it includes characters like ½ and you surely don't want a variable named x½ or x⑴. That's x1/2 and x(1) for those without Unicode fonts. --- Bruce Learn how hackers think: http://j.mp/gruyere-security https://www.linkedin.com/in/bruceleban On Fri, May 2, 2014 at 12:34 PM, Roman Inflianskas <infroma@gmail.com>wrote:
It's really useful that python 3 allows me to use some Unicode symbols (as specified in https://docs.python.org/3.4/reference/lexical_analysis.html#identifiers), especially Greek symbols for mathematical programs. But when I write mathematical program with lots of indices I would like to use symbols from block "Superscripts and Subscripts" (as id_continue), for example:
⁴₂₍₎
I don't see any problems with allowing yet another subset of Unicode symbols. In Julia, for example, I can use them without problems.
--
Regards, Roman Inflianskas
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, May 2, 2014 at 3:34 PM, Roman Inflianskas <infroma@gmail.com> wrote:
I would like to use symbols from block "Superscripts and Subscripts"
-1 Python uses ** operator for what is superscript in math and [] operator for what is subscript. Allowing sub/superscripts in identifiers will create confusion. (It is not uncommon to mix typeset math with python code in generated documentation.) If you have many identifiers with subscripts, I would recommend using a list or a dictionary and call them a[1], a[2], etc. instead of a<sub>1, a<sub>2.
On 5/2/2014 3:34 PM, Roman Inflianskas wrote:
It's really useful that python 3 allows me to use some Unicode symbols (as specified inhttps://docs.python.org/3.4/reference/lexical_analysis.html#identifiers), especially Greek symbols for mathematical programs. But when I write mathematical program with lots of indices I would like to use symbols from block "Superscripts and Subscripts" (as id_continue), for example:
⁴₂₍₎
I don't see any problems with allowing yet another subset of Unicode symbols. In Julia, for example, I can use them without problems.
From 2.3. Identifiers and keywords "The syntax of identifiers in Python is based on the Unicode standard annex UAX-31, with elaboration and changes as defined below; see also PEP 3131 for further details." -- Terry Jan Reedy
On 5/2/2014 6:29 PM, Terry Reedy wrote:
On 5/2/2014 3:34 PM, Roman Inflianskas wrote:
It's really useful that python 3 allows me to use some Unicode symbols (as specified inhttps://docs.python.org/3.4/reference/lexical_analysis.html#identifiers),
especially Greek symbols for mathematical programs. But when I write mathematical program with lots of indices I would like to use symbols from block "Superscripts and Subscripts" (as id_continue), for example:
⁴₂₍₎
I believe 'other numbers' are intentionally omitted.
I don't see any problems with allowing yet another subset of Unicode symbols. In Julia, for example, I can use them without problems.
If the rules for identifiers are expanded, any code the uses newly allowed names cannot be backported or run on previous versions. In contracted, the opposite problem occurs. I do not think they should be changed either way without a strong cause.
From 2.3. Identifiers and keywords "The syntax of identifiers in Python is based on the Unicode standard annex UAX-31, with elaboration and changes as defined below; see also PEP 3131 for further details."
In other words, we use the standard with a few intentional modifications. The 2.x ascii rules were the same or very similar as in other languages (such as C). The 3.x rule are similar to other languages that follow the same standard. There is a benefit to this. -- Terry Jan Reedy
On Fri, May 02, 2014 at 10:27:56PM -0400, Terry Reedy wrote:
If the rules for identifiers are expanded, any code the uses newly allowed names cannot be backported or run on previous versions. In contracted, the opposite problem occurs. I do not think they should be changed either way without a strong cause.
That applies to any new feature -- code using that feature cannot be easily backported. In this case, it's actually quite simple to backport code using the new rules for identifiers: just change the identifiers. The algorithm used by the code remains that same.
From 2.3. Identifiers and keywords "The syntax of identifiers in Python is based on the Unicode standard annex UAX-31, with elaboration and changes as defined below; see also PEP 3131 for further details."
In other words, we use the standard with a few intentional modifications.
Playing Devil's Advocate, perhaps we could add a few more intentional modifications. While there are advantages to following a standard just for the sake of following a standard, once you allow any changes, you're no longer following the standard. So the argument becomes, why should we allow that change but not this change? Particularly for mathematically-focused code, I think it would be useful to be able to use identifiers like (say) σ² for variance, g₁ for sample skewness, or β₂ for Pearson's skewness, to give a few real-world examples. Regular digits may be ambiguous: compare s₁² for the sample variance with Bessel's correction, versus s12. (s twelve?) I'm going to give a tentative +1 vote to allowing superscript and subscript letters and digits in identifiers, if it can be done without excessive cost in complexity or performance. Anything else, like (say) ⑤ (CIRCLED DIGIT FIVE), I will give a firm -1. -- Steven
Steven D'Aprano wrote:
Particularly for mathematically-focused code, I think it would be useful to be able to use identifiers like (say) σ² for variance,
Having σ² be a variable name could be confusing. To a mathematician, it's not a distinct variable, it's just σ ** 2. -- Greg
On Sat, May 3, 2014 at 4:38 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Steven D'Aprano wrote:
Particularly for mathematically-focused code, I think it would be useful to be able to use identifiers like (say) σ² for variance,
Having σ² be a variable name could be confusing. To a mathematician, it's not a distinct variable, it's just σ ** 2.
Maybe, but subscripts can be useful. Recently we were discussing linear acceleration on python-list, and the way I learned the principle (other people learned it with different letters) was: Vₜ = V₀t + at²/2 which should translate into Python as: Vₜ = V₀*t + a*t*t/2 (Not sure if people's fonts have all those characters; that's read "V-t equals V-0 t plus a t squared over two".) Being able to use subscripts in identifiers wouldn't be *often* useful, but it would make direct translation from math to code a bit easier. ChrisA
I've actually written programs like that and honestly names like 'sigma' and 'beta' and 'v_t' worked just fine. Many of us have used (x1, y1) and (x2, y2) without confusing anyone because the digits weren't subscripted. The ability to use Unicode in identifiers I'm sure is appreciated by non-English writers but that's a decidedly different issue. This is a solution without an actual problem. --- Bruce (from my phone)
Bruce Leban, 03.05.2014 09:19:
I've actually written programs like that and honestly names like 'sigma' and 'beta' and 'v_t' worked just fine. Many of us have used (x1, y1) and (x2, y2) without confusing anyone because the digits weren't subscripted.
Plus, the numbers are much easier to read that way than in tiny subscripts. Stefan
On Sat, May 3, 2014 at 5:19 PM, Bruce Leban <bruce@leapyear.org> wrote:
I've actually written programs like that and honestly names like 'sigma' and 'beta' and 'v_t' worked just fine. Many of us have used (x1, y1) and (x2, y2) without confusing anyone because the digits weren't subscripted.
Yeah; like I said, it's not a big thing. I certainly wouldn't choose a language on the basis of subscript-digit-support-in-identifiers. But when I'm working with maths I'm not overly familiar with (stuff a lot more complicated than simple linear acceleration), and I'm trying to translate a not-quite-perfect set of handwritten scribbles into code, every little bit helps. That's why WYSIWYG music editing software is so much more popular with novices than GNU Lilypond is - if you're not *really* familiar with what you're working with, the difference between "dot on the page that looks like this" and "c'8." slows you down. Not insurmountable but the mind glitches across the gap. ChrisA
On Sat, May 03, 2014 at 06:38:21PM +1200, Greg Ewing wrote:
Steven D'Aprano wrote:
Particularly for mathematically-focused code, I think it would be useful to be able to use identifiers like (say) σ² for variance,
Having σ² be a variable name could be confusing. To a mathematician, it's not a distinct variable, it's just σ ** 2.
Actually, not really. A better way of putting it is that the standard deviation is "just" the square root of σ². Variance comes first (it's defined from first principles), and then the standard deviation is defined by taking the square root. But really, it doesn't matter which is derived from which. To a mathematician, x² is just as much a legitimate variable as x. One can say that f is a function of x² just as well as saying that it is a function of y, where y happens to equal x². But regardless of philisophical differences regarding the nature of what is or isn't a variable, versus something derived from a variable, it simply is useful to have a one-to-one correspondence between variables in Python code and notation used in mathematics. Is it useful enough to make up for the (minor) issues that others have already mentioned? I think so, but I will understand if others disagree. I think that the ability to distinguish between x² and x₂ can be important, and both x2 and x_2 are poor substitutes. (Of the two, I prefer x2.) But I'm also aware that this is very dependent on the problem domain. I wouldn't use x² and x₂ outside of a mathematical context. -- Steven
Steven D'Aprano writes:
On Sat, May 03, 2014 at 06:38:21PM +1200, Greg Ewing wrote:
Steven D'Aprano wrote:
Particularly for mathematically-focused code, I think it would be useful to be able to use identifiers like (say) σ² for variance,
Having σ² be a variable name could be confusing. To a mathematician, it's not a distinct variable, it's just σ ** 2.
Actually, not really. A better way of putting it is that the standard deviation is "just" the square root of σ². Variance comes first (it's defined from first principles), and then the standard deviation is defined by taking the square root.
Thank you for writing that better than I could have. :-)
But really, it doesn't matter which is derived from which. To a mathematician, x² is just as much a legitimate variable as x. One can say that f is a function of x² just as well as saying that it is a function of y, where y happens to equal x².
We part company here. x² (in the usage "function of x²") is not a variable, it's an expression. I don't think I've even seen the usage "f(x²) = ..." in a *definition* of "f", with the single exception of the use of "f(μ,σ²) = ..." in defining the distribution of a random variable, and even then that's unusual (σ is almost always more convenient, even for test statistics). I'd consider that the exception that proves the rule.... Especially in a case like z(x,μ,σ²) = (x - μ)/σ! To put it another way, I suspect you would get rather upset if I used both x and x² in such a context and treated them as I would x and y. Or, if in real analysis I ignored the fact that x² is necessarily non-negative. I could go on, but I think the point is clear: *linguistically* these are expressions, not variables -- they are constructed syntactically, and their semantics can be deduced from the syntax. Of course in mathematics you can treat them as variables (as statisticians do σ²), but that works because in mathematics no symbols or syntax have fixed semantics, not π, not even 0. If you can get a version of Python that has "where ..." clauses in it that can define semantics for sub- and superscript syntax past Guido, I'd be all for this. But I really don't think that's going to happen.<wink/>
Is it useful enough to make up for the (minor) issues that others have already mentioned? I think so, but I will understand if others disagree. I think that the ability to distinguish between x² and x₂ can be important,
Which, I suspect, means these notations don't pass the "generalized grit on Tim's monitor" test.
and both x2 and x_2 are poor substitutes.
In programming (as opposed to the chemistry of nuclear fusion), if you need to distinguish x² from x₂, and x**2 and x[2] don't do the trick, I suspect your notation has real readability problems no matter how you arrange things spatially. I guess that use cases where such usage is in good taste are way too rare to justify this.
On 05/03/2014 05:05 AM, Steven D'Aprano wrote:
On Sat, May 03, 2014 at 06:38:21PM +1200, Greg Ewing wrote:
Steven D'Aprano wrote:
Particularly for mathematically-focused code, I think it would be useful to be able to use identifiers like (say) σ² for variance,
Having σ² be a variable name could be confusing. To a mathematician, it's not a distinct variable, it's just σ ** 2.
Actually, not really. A better way of putting it is that the standard deviation is "just" the square root of σ². Variance comes first (it's defined from first principles), and then the standard deviation is defined by taking the square root.
The main problem I see is that many possible questions come to mind rather than one simple or obvious interpretation. Cheers, Ron
On Sat, May 03, 2014 at 11:39:23AM -0400, Ron Adam wrote:
On 05/03/2014 05:05 AM, Steven D'Aprano wrote:
On Sat, May 03, 2014 at 06:38:21PM +1200, Greg Ewing wrote:
Steven D'Aprano wrote:
Particularly for mathematically-focused code, I think it would be useful to be able to use identifiers like (say) σ² for variance,
Having σ² be a variable name could be confusing. To a mathematician, it's not a distinct variable, it's just σ ** 2.
Actually, not really. A better way of putting it is that the standard deviation is "just" the square root of σ². Variance comes first (it's defined from first principles), and then the standard deviation is defined by taking the square root.
The main problem I see is that many possible questions come to mind rather than one simple or obvious interpretation.
If I name a variable "x2", what is the "one simple or obvious interpretation" that such an identifier presumably has? If standard, ASCII-only identifiers don't have a single interpretation, why should identifiers like σ² be held to that requirement? Like any other identifier, one needs to interpret the name in context. Identifiers can be idiomatic ("i" for a loop variable, "c" for a character), more or less descriptive ("number_of_pages", "npages"), or obfuscated ("e382702"). They can be written in English, or in some other language. They can be ordinary words, or jargon that only means something to those who understand the problem domain. None of this will be different if sub/superscript digits and letters are allowed. One of the frustrations on this list is how often people hold new proposals to higher standard than existing features. Particularly *impossible* standards. It simply isn't possible for characters like superscript-two to be given a *single* interpretation (although there is an obvious one, namely "squared") any more than it is possible for the letter "a" to be given a *single* interpretation. There are valid objections to this proposal. It may be that the effort needed to allow code points like ² in identifiers without also allowing ½ or ② may be too great. Or the performance cost is too high. Or the benefit for mathematical-style code doesn't justify adding additional language complexity. Or even a purely aethetic judgement "I just don't like it". (I don't like identifiers written in cyrillic, because I can't read them, but I'm not the target audience for such identifiers and I will never need to read them. Consequently I don't object if other people use cyrillic identifiers in their personal code.) Holding this proposal up to an impossible standard which plain ASCII identifiers don't even meet is simply not cricket. Thank you all for letting me get that off my chest, and apologies to Ron for singling him out. -- Steven
On Sun, May 4, 2014 at 3:57 AM, Steven D'Aprano <steve@pearwood.info> wrote:
One of the frustrations on this list is how often people hold new proposals to higher standard than existing features. ...
Holding this proposal up to an impossible standard which plain ASCII identifiers don't even meet is simply not cricket.
Thank you all for letting me get that off my chest, and apologies to Ron for singling him out.
A fair point in this case, and yet there is such a thing as the grandfather clause. Adding something to the language has a much higher bar than merely retaining something (because *removing* something from the language has an even higher bar), so a proposal can't simply say "It's no worse than what we have already" to get acceptance. Impossible standard? A bit unfair. Higher than existing features? Quite possibly has its place. ChrisA
Steven D'Aprano writes:
If I name a variable "x2", what is the "one simple or obvious interpretation" that such an identifier presumably has? If standard, ASCII-only identifiers don't have a single interpretation, why should identifiers like σ² be held to that requirement?
Because subscripts and superscripts are syntactic constructs, and naturally decompose into two identifiers in a specific relationship (even if that relationship cannot be further specified without going deep into some domain of discourse) -- and that is is much of the motivation for wanting to use them. "x2" does not carry that load. Note that Unicode itself considers them *compatibility* characters and says: Superscripts and subscripts have been included in the Unicode Standard only to provide compatibility with existing character sets. In general, the Unicode character encoding does not attempt to describe the positioning of a character above or below the baseline in typographical layout. In other words, Unicode is reluctant to guarantee that x2, x², and x₂ are actually different identifiers! It's considered bad practice to treat them as the same, but not actually forbidden. At least 2 technical reports (#20 and #25) discourage their use except in the case where they are letter-like (phonetic transcriptions use several such letters, where they have different meaning from their compatibility equivalents). The more I look into this, the more I think it is really problematic.
On Sun, May 04, 2014 at 03:34:32AM +0900, Stephen J. Turnbull wrote:
Note that Unicode itself considers them *compatibility* characters and says:
Superscripts and subscripts have been included in the Unicode Standard only to provide compatibility with existing character sets. In general, the Unicode character encoding does not attempt to describe the positioning of a character above or below the baseline in typographical layout.
In other words, Unicode is reluctant to guarantee that x2, x², and x₂ are actually different identifiers! [...]
I don't think this is a valid interpretation of what the Unicode standard is trying to say, but the point is moot. I think you've just identified (pun intended) a major objection to the proposal, one serious enough to change my mind from limited support to opposition. Python identifiers are treated by their NFKC normalised form: All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC. https://docs.python.org/3/reference/lexical_analysis.html And superscripts and subscripts normalise to standard characters: py> [unicodedata.normalize('NFKC', s) for s in 'x² x₂ x2'.split()] ['x2', 'x2', 'x2'] So that categorically rules out allowing superscripts and subscripts as *distinct* characters in identifiers. So even if they were allowed, it would mean that x² and x₂ would be treated as the same identifier as x2. For my use-case, I would want x² and x₂ to be treated as distinct identifiers, not just as a funny way of writing x2. So from my perspective, *at best* there is now insufficient benefit to bother allowing them. It's actually stronger than that: allowing superscripts and subscripts would be an attractive nuisance for my use-case. If they were allowed, I would be tempted to write x² and x₂, which could end up being a subtle source of bugs if I accidentally used them both in the same namespace, thinking that they were distinct when they actually aren't. So I am now -1 on allowing superscripts and subscripts. -- Steven
On Sunday 04 May 2014 12:40:44 Steven D'Aprano wrote:
On Sun, May 04, 2014 at 03:34:32AM +0900, Stephen J. Turnbull wrote:
Note that Unicode itself considers them *compatibility* characters and says:
Superscripts and subscripts have been included in the Unicode Standard only to provide compatibility with existing character sets. In general, the Unicode character encoding does not attempt to describe the positioning of a character above or below the baseline in typographical layout.
In other words, Unicode is reluctant to guarantee that x2, x², and x₂ are actually different identifiers! [...]
I don't think this is a valid interpretation of what the Unicode standard is trying to say, but the point is moot. I think you've just identified (pun intended) a major objection to the proposal, one serious enough to change my mind from limited support to opposition.
Python identifiers are treated by their NFKC normalised form:
All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.
https://docs.python.org/3/reference/lexical_analysis.html
And superscripts and subscripts normalise to standard characters:
py> [unicodedata.normalize('NFKC', s) for s in 'x² x₂ x2'.split()] ['x2', 'x2', 'x2']
So that categorically rules out allowing superscripts and subscripts as *distinct* characters in identifiers. So even if they were allowed, it would mean that x² and x₂ would be treated as the same identifier as x2.
For my use-case, I would want x² and x₂ to be treated as distinct identifiers, not just as a funny way of writing x2. So from my perspective, *at best* there is now insufficient benefit to bother allowing them.
It's actually stronger than that: allowing superscripts and subscripts would be an attractive nuisance for my use-case. If they were allowed, I would be tempted to write x² and x₂, which could end up being a subtle source of bugs if I accidentally used them both in the same namespace, thinking that they were distinct when they actually aren't. So I am now -1 on allowing superscripts and subscripts.
That's the strongest point against allowing superscripts and subscripts in a whole discussion, IMHO. I would want x² and x₂ to be treated as distinct identifiers either. I've tried this use case in Julia and it works: julia> x₂ = 1 1 julia> x² = 2 2 julia> x₂ 1 julia> x² 2 But then I've found thread in Julia's bugtracker covering unicode identifiers normalization[1]. As I understood they don't use NFKC. As a consequence symbols "μ" (0x00b5) and "µ" (0x03bc) are treated as different. They understood that it's weird and they need to do something about this. Some of they don't want to use NFKC because of the same reason (+ for example, "H" and "ℍ" would became equal identifiers). Others decided to give a warning when new identifier is equal to the defined one (in the terms of NFKC normalization). Now I understood that things are more complicated that I considered them when I did a proposal. I think that there is no "good way" to add support for subscripts and superscripts. So it's better to leave the situation as is. -- Regards, Roman Inflianskas -------- [1] covering unicode identifiers normalization
On 5/4/2014 3:10 AM, Roman Inflianskas wrote:
Now I understood that things are more complicated that I considered them when I did a proposal. I think that there is no "good way" to add support for subscripts and superscripts. So it's better to leave the situation as is.
If you are the one who opened the tracker issue, please close it. And thanks for bringing the discussion here. -- Terry Jan Reedy
On Sunday 04 May 2014 05:51:25 Terry Reedy wrote:
On 5/4/2014 3:10 AM, Roman Inflianskas wrote:
Now I understood that things are more complicated that I considered them when I did a proposal. I think that there is no "good way" to add support for subscripts and superscripts. So it's better to leave the situation as is.
If you are the one who opened the tracker issue, please close it. And thanks for bringing the discussion here.
Done. Thank you for participation in this discussion. The next time I will not open bug before discussion, I promise :) -- Regards, Roman Inflianskas
On 05/03/2014 01:57 PM, Steven D'Aprano wrote:
On Sat, May 03, 2014 at 11:39:23AM -0400, Ron Adam wrote:
On 05/03/2014 05:05 AM, Steven D'Aprano wrote:
On Sat, May 03, 2014 at 06:38:21PM +1200, Greg Ewing wrote:
>>>Steven D'Aprano wrote:
> > >>>> >Particularly for mathematically-focused code, I think it would be >>>useful > > >>>> >to be able to use identifiers like (say) σ² for variance,
>>>Having σ² be a variable name could be confusing. To a >>>mathematician, it's not a distinct variable, it's >>>just σ ** 2.
Actually, not really. A better way of putting it is that the standard deviation is "just" the square root of σ². Variance comes first (it's defined from first principles), and then the standard deviation is defined by taking the square root.
The main problem I see is that many possible questions come to mind rather than one simple or obvious interpretation.
If I name a variable "x2", what is the "one simple or obvious interpretation" that such an identifier presumably has? If standard, ASCII-only identifiers don't have a single interpretation, why should identifiers like σ² be held to that requirement?
Steven Turnbull pointed out some of the different interpretations I was thinking about in his reply to this message. Mainly that of it being more of a syntactic form, but as you said it also might be interpreted as an identifier spelling.
Like any other identifier, one needs to interpret the name in context. Identifiers can be idiomatic ("i" for a loop variable, "c" for a character), more or less descriptive ("number_of_pages", "npages"), or obfuscated ("e382702"). They can be written in English, or in some other language. They can be ordinary words, or jargon that only means something to those who understand the problem domain. None of this will be different if sub/superscript digits and letters are allowed.
One of the frustrations on this list is how often people hold new proposals to higher standard than existing features. Particularly *impossible* standards. It simply isn't possible for characters like superscript-two to be given a*single* interpretation (although there is an obvious one, namely "squared") any more than it is possible for the letter "a" to be given a*single* interpretation.
There are valid objections to this proposal. It may be that the effort needed to allow code points like ² in identifiers without also allowing ½ or ② may be too great. Or the performance cost is too high. Or the benefit for mathematical-style code doesn't justify adding additional language complexity.
Or even a purely aethetic judgement "I just don't like it". (I don't like identifiers written in cyrillic, because I can't read them, but I'm not the target audience for such identifiers and I will never need to read them. Consequently I don't object if other people use cyrillic identifiers in their personal code.)
Holding this proposal up to an impossible standard which plain ASCII identifiers don't even meet is simply not cricket.
Thank you all for letting me get that off my chest, and apologies to Ron for singling him out.
No problem, you didn't comment on me, but expressed your own thoughts. That's fine. But thanks for clarifying the context of your message, it does help us avoid unintended misunderstandings in message based conversations like these where we don't get to hear the tone of a message. I feel the same as you describe here in many of these discussions. Enough so that I'm attempting to write a minimal language that uses some of the features I've thought about. The exercise was/is helping me understand many of the lower level language-design patterns in python and some other languages. Some of the ideas I've wanted just don't fit with pythons design, and some would work, but not without many changes to other parts. And some ideas we can't do because they directly conflict with something we already have. Sigh. The ones that most interest me are the ones that simplify or unify existing features, but those are also the one that are the hardest to do right. ;-) Cheers, Ron
On 5/3/2014 12:50 AM, Steven D'Aprano wrote:
On Fri, May 02, 2014 at 10:27:56PM -0400, Terry Reedy wrote:
If the rules for identifiers are expanded, any code the uses newly allowed names cannot be backported or run on previous versions. In contracted, the opposite problem occurs. I do not think they should be changed either way without a strong cause.
That applies to any new feature -- code using that feature cannot be easily backported. In this case, it's actually quite simple to backport code using the new rules for identifiers: just change the identifiers. The algorithm used by the code remains that same.
It appears that I consider lexicography more 'fundamental' in some sense than you do. But lets skip over this.
From 2.3. Identifiers and keywords "The syntax of identifiers in Python is based on the Unicode standard annex UAX-31, with elaboration and changes as defined below; see also PEP 3131 for further details."
Without reading the annex, I cannot tell which part of the 'below' actually defines a 'change', as opposed to an 'elaboration' (explanation). I have no idea whether the unknown changes are additions, deletions, or merely selections of options.
In other words, we use the standard with a few intentional modifications.
Playing Devil's Advocate, perhaps we could add a few more intentional modifications.
Or perhaps not, depending on what the modifications actually are and what the reasons were.
While there are advantages to following a standard just for the sake of following a standard, once you allow any changes, you're no longer following the standard. So the argument becomes, why should we allow that change but not this change?
Nick recently argued, very similarly, that having restored string 'u' prefixes was a reason to restore dict.iterxyz methods. You agreed with me that there were good reasons why B did not follow from A. To properly compare current and proposed changes, we must know the current 'modifications and changes', their reasons and effects, and the proposed changes and their reasons (any real parallels) and likely effects. If you were to do the research, I would be willing to discuss.
Particularly for mathematically-focused code, I think it would be useful to be able to use identifiers like (say) σ² for variance, g₁ for sample skewness, or β₂ for Pearson's skewness, to give a few real-world examples. Regular digits may be ambiguous: compare s₁² for the sample variance with Bessel's correction, versus s12. (s twelve?)
I agree that there are good uses for this restricted set of additions. Would you allow super/subscripts as prefixes rather than suffixes? I presume not since we already disallow initial numbers.
I'm going to give a tentative +1 vote to allowing superscript and subscript letters and digits in identifiers, if it can be done without excessive cost in complexity or performance.
Would you consider doubling the cost of checking each character (a reasonable estimate, I think) excessive or not?
Anything else, like (say) ⑤ (CIRCLED DIGIT FIVE), I will give a firm -1.
-- Terry Jan Reedy
On 5/3/2014 5:48 PM, Terry Reedy wrote:
Would you consider doubling the cost of checking each character (a reasonable estimate, I think) excessive or not?
Thinking about it more, I think double is an over-estimate. Since I do not know how the unicode lexer works, I won't guess or worry about the cost until there it times code with and without the change. -- Terry Jan Reedy
On Sat, May 3, 2014 at 5:48 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Would you allow super/subscripts as prefixes rather than suffixes? I presume not since we already disallow initial numbers.
Python 3 does not recognize subscripts as numbers:
int('₂') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: '₂'
participants (10)
-
Alexander Belopolsky
-
Bruce Leban
-
Chris Angelico
-
Greg Ewing
-
Roman Inflianskas
-
Ron Adam
-
Stefan Behnel
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy