Overloading unary plus in strings with "ord"

So `ord` is already a really fast function with (last check before this thread was posted) 166 nsec per loop. But I'm wondering... doing `ord(a)` produces this bytecode:

On Tue, Oct 12, 2021 at 10:41 PM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
-1. It's unnecessary optimization for an uncommon case, abuse of syntax (it's even worse than JavaScript using +"123" to force it to be a number), and illogical - why should +"a" be the integer 97? ChrisA

On Tue, Oct 12, 2021 at 11:36:42PM +1100, Chris Angelico wrote:
You haven't given any reason why unary plus should imply ord().
I think the question Chris is really asking is why should unary plus return ord() rather than any other function or method. We could make unary plus of a string equal to the upper() method: +"Hello world" # returns "HELLO WORLD" or the strip() method: +" Hello world " # returns "Hello world" or len(): +"Hello world" # returns 11 or any other function or method we want. What is so special about ord(), and what is the connection between ord() and `+` that makes it obvious that +"a" should return 97 rather than "A" or 1 or 10 or something else? It's not enough to just say that unary plus is unused for strings, you have to justify why the average programmer will look at unary plus and immediately think "ord". -- Steve

On Tue, Oct 12, 2021 at 9:40 AM MRAB <python@mrabarnett.plus.com> wrote:
I would "strengthen" it further by suggesting swapcase for the squiggle operator:
~"Lime Cordial Delicious" 'lIME cORDIAL dELICIOUS'
And title case for the carot:
^"lime cordial delicious" 'Lime Cordial Delicious'
So many shortcuts! Think of the line space savings. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

Using the caret as a prefix unary operator would require changes in python grammar. For now, stick to implementing existing operators. But the rest of the ideas are good though.

On Wed, Oct 13, 2021 at 9:15 AM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
Using the caret as a prefix unary operator would require changes in python grammar. For now, stick to implementing existing operators. But the rest of the ideas are good though.
You may need to get your sensors tuned up, as not one of those ideas was intended to be taken seriously. We do not need to find meanings for every operator, especially not completely arbitrary ones. Mathematicians and programmers both extend operators to new meanings, but only where it makes sense. For instance, you might interpret multiplication as repeated addition ("3 times 5 means 5 plus 5 plus 5"), and then logically interpret string-times-integer multiplication the same way ("3 times spam means spam plus spam plus spam"), and Python indeed agrees with that:
3 * "spam" 'spamspamspam'
But if you can't justify it with something like that, then it's usually a bad idea. Unary plus meaning ord() has no justification from mathematics or other parts of Python (or, if it does, I haven't heard them), so it doesn't give any reason for being. Of course, it's also possible that your entire thread here is *itself* a parody, in which case, I apologise for not noticing it. ChrisA

On Wed, Oct 13, 2021 at 09:22:09AM +1100, Chris Angelico wrote:
Mathematicians and programmers both extend operators to new meanings, but only where it makes sense.
In fairness, mathematicians typically just invent new symbols, when they're not repurposing existing symbols for totally unrelated ideas :-( I count over 400 maths symbols in Unicode, including over 100 variations on < and > and over 30 variations of the plus sign. So we want to be *really* cautious about following the lead of mathematicians. -- Steve

On Tue, Oct 12, 2021 at 06:34:06AM -0000, Jeremiah Vivian wrote:
Don't be fooled though, the UNARY_POSITIVE byte-code has to inspect the argument `a` for a `__pos__` method, and if it exists, call it. So there is a couple of hidden function calls in there. But it is true that operators do save the cost of looking up the function name.
If the lookup of a function is having a significant cost to you, perhaps because you are calling the function in a really tight loop, there is an optimization you can do. Suppose we have an enormous string with a billion characters, and we run this: a = 0 for c in enormous_string: a += ord(c) That looks up ord once for every character in the string. But if we move the code to a function, and use a local variable, we can reduce those lookups to one instead of a billion: def func(enormous_string): ordinal = ord a = 0 for c in enormous_string: a += ordinal(c) return a a = func(enormous_string) That may give you a small percentage boost.
And also, the unary `+` of strings only copies strings, which should be redundant in most cases.
Are you sure about that? >>> s = +"a" Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: bad operand type for unary +: 'str' I don't think it works. What version of Python are you using? -- Steve

So I'll post another thread about unary operators for strings. Everything expanded from just unary positive to all unary operators.

On Tue, Oct 12, 2021 at 10:41 PM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
-1. It's unnecessary optimization for an uncommon case, abuse of syntax (it's even worse than JavaScript using +"123" to force it to be a number), and illogical - why should +"a" be the integer 97? ChrisA

On Tue, Oct 12, 2021 at 11:36:42PM +1100, Chris Angelico wrote:
You haven't given any reason why unary plus should imply ord().
I think the question Chris is really asking is why should unary plus return ord() rather than any other function or method. We could make unary plus of a string equal to the upper() method: +"Hello world" # returns "HELLO WORLD" or the strip() method: +" Hello world " # returns "Hello world" or len(): +"Hello world" # returns 11 or any other function or method we want. What is so special about ord(), and what is the connection between ord() and `+` that makes it obvious that +"a" should return 97 rather than "A" or 1 or 10 or something else? It's not enough to just say that unary plus is unused for strings, you have to justify why the average programmer will look at unary plus and immediately think "ord". -- Steve

On Tue, Oct 12, 2021 at 9:40 AM MRAB <python@mrabarnett.plus.com> wrote:
I would "strengthen" it further by suggesting swapcase for the squiggle operator:
~"Lime Cordial Delicious" 'lIME cORDIAL dELICIOUS'
And title case for the carot:
^"lime cordial delicious" 'Lime Cordial Delicious'
So many shortcuts! Think of the line space savings. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

Using the caret as a prefix unary operator would require changes in python grammar. For now, stick to implementing existing operators. But the rest of the ideas are good though.

On Wed, Oct 13, 2021 at 9:15 AM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
Using the caret as a prefix unary operator would require changes in python grammar. For now, stick to implementing existing operators. But the rest of the ideas are good though.
You may need to get your sensors tuned up, as not one of those ideas was intended to be taken seriously. We do not need to find meanings for every operator, especially not completely arbitrary ones. Mathematicians and programmers both extend operators to new meanings, but only where it makes sense. For instance, you might interpret multiplication as repeated addition ("3 times 5 means 5 plus 5 plus 5"), and then logically interpret string-times-integer multiplication the same way ("3 times spam means spam plus spam plus spam"), and Python indeed agrees with that:
3 * "spam" 'spamspamspam'
But if you can't justify it with something like that, then it's usually a bad idea. Unary plus meaning ord() has no justification from mathematics or other parts of Python (or, if it does, I haven't heard them), so it doesn't give any reason for being. Of course, it's also possible that your entire thread here is *itself* a parody, in which case, I apologise for not noticing it. ChrisA

On Wed, Oct 13, 2021 at 09:22:09AM +1100, Chris Angelico wrote:
Mathematicians and programmers both extend operators to new meanings, but only where it makes sense.
In fairness, mathematicians typically just invent new symbols, when they're not repurposing existing symbols for totally unrelated ideas :-( I count over 400 maths symbols in Unicode, including over 100 variations on < and > and over 30 variations of the plus sign. So we want to be *really* cautious about following the lead of mathematicians. -- Steve

On Tue, Oct 12, 2021 at 06:34:06AM -0000, Jeremiah Vivian wrote:
Don't be fooled though, the UNARY_POSITIVE byte-code has to inspect the argument `a` for a `__pos__` method, and if it exists, call it. So there is a couple of hidden function calls in there. But it is true that operators do save the cost of looking up the function name.
If the lookup of a function is having a significant cost to you, perhaps because you are calling the function in a really tight loop, there is an optimization you can do. Suppose we have an enormous string with a billion characters, and we run this: a = 0 for c in enormous_string: a += ord(c) That looks up ord once for every character in the string. But if we move the code to a function, and use a local variable, we can reduce those lookups to one instead of a billion: def func(enormous_string): ordinal = ord a = 0 for c in enormous_string: a += ordinal(c) return a a = func(enormous_string) That may give you a small percentage boost.
And also, the unary `+` of strings only copies strings, which should be redundant in most cases.
Are you sure about that? >>> s = +"a" Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: bad operand type for unary +: 'str' I don't think it works. What version of Python are you using? -- Steve

So I'll post another thread about unary operators for strings. Everything expanded from just unary positive to all unary operators.
participants (5)
-
Chris Angelico
-
Jeremiah Vivian
-
MRAB
-
Ricky Teachey
-
Steven D'Aprano