Overloading unary plus in strings with "ord"
So `ord` is already a really fast function with (last check before this thread was posted) 166 nsec per loop. But I'm wondering... doing `ord(a)` produces this bytecode:
2 4 LOAD_NAME 1 (ord) 6 LOAD_NAME 0 (a) 8 CALL_FUNCTION 1 10 POP_TOP 12 LOAD_CONST 1 (None) 14 RETURN_VALUE But doing `+a` only produces this: 2 4 LOAD_NAME 0 (a) 6 UNARY_POSITIVE 8 POP_TOP 10 LOAD_CONST 1 (None) 12 RETURN_VALUE So an operator has its own bytecode, doesn't need to `LOAD_*` a function, and doesn't have the impact on performance when converting arguments to the format of the (after `POP()`ing every argument) TOS function and then calling that function. And also, the unary `+` of strings only copies strings, which should be redundant in most cases. Maybe `ord` can take the place of unary `+` in strings?
On Tue, Oct 12, 2021 at 10:41 PM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
So `ord` is already a really fast function with (last check before this thread was posted) 166 nsec per loop. But I'm wondering... doing `ord(a)` produces this bytecode:
2 4 LOAD_NAME 1 (ord) 6 LOAD_NAME 0 (a) 8 CALL_FUNCTION 1 10 POP_TOP 12 LOAD_CONST 1 (None) 14 RETURN_VALUE But doing `+a` only produces this: 2 4 LOAD_NAME 0 (a) 6 UNARY_POSITIVE 8 POP_TOP 10 LOAD_CONST 1 (None) 12 RETURN_VALUE So an operator has its own bytecode, doesn't need to `LOAD_*` a function, and doesn't have the impact on performance when converting arguments to the format of the (after `POP()`ing every argument) TOS function and then calling that function. And also, the unary `+` of strings only copies strings, which should be redundant in most cases. Maybe `ord` can take the place of unary `+` in strings?
-1. It's unnecessary optimization for an uncommon case, abuse of syntax (it's even worse than JavaScript using +"123" to force it to be a number), and illogical - why should +"a" be the integer 97? ChrisA
-1. It's unnecessary optimization for an uncommon case, abuse of syntax Good point. But what else can the unary positive do? I'm just trying to add a use for it. illogical - why should +"a" be the integer 97? Because `ord("a")` is `97`. Have you read the last question at the end of the post?
On Tue, Oct 12, 2021 at 11:27 PM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
-1. It's unnecessary optimization for an uncommon case, abuse of syntax Good point. But what else can the unary positive do? I'm just trying to add a use for it. illogical - why should +"a" be the integer 97? Because `ord("a")` is `97`. Have you read the last question at the end of the post?
And eval("0xa") is 10. Why shouldn't +"a" be 10 instead? You haven't given any reason why unary plus should imply ord(). ChrisA
On Tue, Oct 12, 2021 at 11:36:42PM +1100, Chris Angelico wrote:
You haven't given any reason why unary plus should imply ord().
I think the question Chris is really asking is why should unary plus return ord() rather than any other function or method. We could make unary plus of a string equal to the upper() method: +"Hello world" # returns "HELLO WORLD" or the strip() method: +" Hello world " # returns "Hello world" or len(): +"Hello world" # returns 11 or any other function or method we want. What is so special about ord(), and what is the connection between ord() and `+` that makes it obvious that +"a" should return 97 rather than "A" or 1 or 10 or something else? It's not enough to just say that unary plus is unused for strings, you have to justify why the average programmer will look at unary plus and immediately think "ord". -- Steve
On 2021-10-12 13:49, Steven D'Aprano wrote:
On Tue, Oct 12, 2021 at 11:36:42PM +1100, Chris Angelico wrote:
You haven't given any reason why unary plus should imply ord().
I think the question Chris is really asking is why should unary plus return ord() rather than any other function or method.
We could make unary plus of a string equal to the upper() method:
+"Hello world" # returns "HELLO WORLD"
You could then strengthen that suggestion by saying the unary minus would be equivalent to the lower() method.
or the strip() method:
+" Hello world " # returns "Hello world"
or len():
+"Hello world" # returns 11
or any other function or method we want. What is so special about ord(), and what is the connection between ord() and `+` that makes it obvious that +"a" should return 97 rather than "A" or 1 or 10 or something else?
It's not enough to just say that unary plus is unused for strings, you have to justify why the average programmer will look at unary plus and immediately think "ord".
On Tue, Oct 12, 2021 at 9:40 AM MRAB <python@mrabarnett.plus.com> wrote:
On 2021-10-12 13:49, Steven D'Aprano wrote:
On Tue, Oct 12, 2021 at 11:36:42PM +1100, Chris Angelico wrote:
You haven't given any reason why unary plus should imply ord().
I think the question Chris is really asking is why should unary plus return ord() rather than any other function or method.
We could make unary plus of a string equal to the upper() method:
+"Hello world" # returns "HELLO WORLD"
You could then strengthen that suggestion by saying the unary minus would be equivalent to the lower() method.
I would "strengthen" it further by suggesting swapcase for the squiggle operator:
~"Lime Cordial Delicious" 'lIME cORDIAL dELICIOUS'
And title case for the carot:
^"lime cordial delicious" 'Lime Cordial Delicious'
So many shortcuts! Think of the line space savings. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
Using the caret as a prefix unary operator would require changes in python grammar. For now, stick to implementing existing operators. But the rest of the ideas are good though.
On Wed, Oct 13, 2021 at 9:15 AM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
Using the caret as a prefix unary operator would require changes in python grammar. For now, stick to implementing existing operators. But the rest of the ideas are good though.
You may need to get your sensors tuned up, as not one of those ideas was intended to be taken seriously. We do not need to find meanings for every operator, especially not completely arbitrary ones. Mathematicians and programmers both extend operators to new meanings, but only where it makes sense. For instance, you might interpret multiplication as repeated addition ("3 times 5 means 5 plus 5 plus 5"), and then logically interpret string-times-integer multiplication the same way ("3 times spam means spam plus spam plus spam"), and Python indeed agrees with that:
3 * "spam" 'spamspamspam'
But if you can't justify it with something like that, then it's usually a bad idea. Unary plus meaning ord() has no justification from mathematics or other parts of Python (or, if it does, I haven't heard them), so it doesn't give any reason for being. Of course, it's also possible that your entire thread here is *itself* a parody, in which case, I apologise for not noticing it. ChrisA
On Wed, Oct 13, 2021 at 09:22:09AM +1100, Chris Angelico wrote:
Mathematicians and programmers both extend operators to new meanings, but only where it makes sense.
In fairness, mathematicians typically just invent new symbols, when they're not repurposing existing symbols for totally unrelated ideas :-( I count over 400 maths symbols in Unicode, including over 100 variations on < and > and over 30 variations of the plus sign. So we want to be *really* cautious about following the lead of mathematicians. -- Steve
On 2021-10-12 23:57, Steven D'Aprano wrote:
On Wed, Oct 13, 2021 at 09:22:09AM +1100, Chris Angelico wrote:
Mathematicians and programmers both extend operators to new meanings, but only where it makes sense.
In fairness, mathematicians typically just invent new symbols, when they're not repurposing existing symbols for totally unrelated ideas :-(
I count over 400 maths symbols in Unicode, including over 100 variations on < and > and over 30 variations of the plus sign.
So we want to be *really* cautious about following the lead of mathematicians.
If you want compact code, try APL.
On Wed, Oct 13, 2021 at 10:02 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 13, 2021 at 09:22:09AM +1100, Chris Angelico wrote:
Mathematicians and programmers both extend operators to new meanings, but only where it makes sense.
In fairness, mathematicians typically just invent new symbols, when they're not repurposing existing symbols for totally unrelated ideas :-(
I count over 400 maths symbols in Unicode, including over 100 variations on < and > and over 30 variations of the plus sign.
So we want to be *really* cautious about following the lead of mathematicians.
Analytic continuation is basically "hey, let's imagine what this would be like if we did this where it doesn't make sense". And then, oddly, it makes sense anyway. ChrisA
On 2021-10-12 13:36, Chris Angelico wrote:
On Tue, Oct 12, 2021 at 11:27 PM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
-1. It's unnecessary optimization for an uncommon case, abuse of syntax Good point. But what else can the unary positive do? I'm just trying to add a use for it. illogical - why should +"a" be the integer 97? Because `ord("a")` is `97`. Have you read the last question at the end of the post?
And eval("0xa") is 10. Why shouldn't +"a" be 10 instead?
Why are you using eval? int is safer: int("0xa", 0)
You haven't given any reason why unary plus should imply ord().
On 2021-10-12 13:25, Jeremiah Vivian wrote:
-1. It's unnecessary optimization for an uncommon case, abuse of syntax Good point. But what else can the unary positive do? I'm just trying to add a use for it. illogical - why should +"a" be the integer 97? Because `ord("a")` is `97`. Have you read the last question at the end of the post?
It would be very surprising for unary plus to return something of a different type to its argument. And, anyway, why should it be equivalent to 'ord' and not 'int'?
On Tue, Oct 12, 2021 at 06:34:06AM -0000, Jeremiah Vivian wrote:
So `ord` is already a really fast function with (last check before this thread was posted) 166 nsec per loop. But I'm wondering... doing `ord(a)` produces this bytecode:
2 4 LOAD_NAME 1 (ord) 6 LOAD_NAME 0 (a) 8 CALL_FUNCTION 1 10 POP_TOP 12 LOAD_CONST 1 (None) 14 RETURN_VALUE
But doing `+a` only produces this:
2 4 LOAD_NAME 0 (a) 6 UNARY_POSITIVE 8 POP_TOP 10 LOAD_CONST 1 (None) 12 RETURN_VALUE
Don't be fooled though, the UNARY_POSITIVE byte-code has to inspect the argument `a` for a `__pos__` method, and if it exists, call it. So there is a couple of hidden function calls in there. But it is true that operators do save the cost of looking up the function name.
So an operator has its own bytecode, doesn't need to `LOAD_*` a function, and doesn't have the impact on performance when converting arguments to the format of the (after `POP()`ing every argument) TOS function and then calling that function.
If the lookup of a function is having a significant cost to you, perhaps because you are calling the function in a really tight loop, there is an optimization you can do. Suppose we have an enormous string with a billion characters, and we run this: a = 0 for c in enormous_string: a += ord(c) That looks up ord once for every character in the string. But if we move the code to a function, and use a local variable, we can reduce those lookups to one instead of a billion: def func(enormous_string): ordinal = ord a = 0 for c in enormous_string: a += ordinal(c) return a a = func(enormous_string) That may give you a small percentage boost.
And also, the unary `+` of strings only copies strings, which should be redundant in most cases.
Are you sure about that? >>> s = +"a" Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: bad operand type for unary +: 'str' I don't think it works. What version of Python are you using? -- Steve
You all have been giving pretty great ideas. But this is the one I'm considering the most: On 2021-10-12 13:49, Steven D'Aprano wrote:
On Tue, Oct 12, 2021 at 11:36:42PM +1100, Chris Angelico wrote:
You haven't given any reason why unary plus should imply ord(). I think the question Chris is really asking is why should unary plus return ord() rather than any other function or method.
We could make unary plus of a string equal to the upper() method:
+"Hello world" # returns "HELLO WORLD"
or the strip() method:
+" Hello world " # returns "Hello world"
or len():
+"Hello world" # returns 11
or any other function or method we want. What is so special about ord(), and what is the connection between ord() and `+` that makes it obvious that +"a" should return 97 rather than "A" or 1 or 10 or something else?
It's not enough to just say that unary plus is unused for strings, you have to justify why the average programmer will look at unary plus and immediately think "ord".
-- Steve
So I'll post another thread about unary operators for strings. Everything expanded from just unary positive to all unary operators.
participants (5)
-
Chris Angelico
-
Jeremiah Vivian
-
MRAB
-
Ricky Teachey
-
Steven D'Aprano