Implementing string unary operators

I posted a previous thread about overloading the unary `+` operator in strings with `ord`, and that expanded to more than just the unary `+` operator. So I'm saying now, there should be these implementations:
Or:
If anyone has better ideas, they can post it here.

I would think `~string` could be good for a shorthand way to convert a string to an integer, considering you’re “inverting” the string to another type, though a downside to this would be that explicit is always better than implicit and ~string will be a confusing operation to many users.

On Wed, Oct 13, 2021 at 11:21 AM MarylandBall Productions <shayaanwadkar@gmail.com> wrote:
I would think `~string` could be good for a shorthand way to convert a string to an integer, considering you’re “inverting” the string to another type, though a downside to this would be that explicit is always better than implicit and ~string will be a confusing operation to many users.
Can you give another example of where you invert something into another type? I don't understand the analogy you're making here. ChrisA

On Wed, Oct 13, 2021 at 12:05:35AM -0000, MarylandBall Productions wrote:
How is `int(string, 16)` "inverting"? Inverting means to flip or reverse, not to convert to another type. Even if it did, why convert to an int, rather than a float, or a list, or bytes, or any other type? You could maybe make an argument for ~"Hello" to invert it by reversing horizontally "olleH" but even that would be pretty weak.
~string will be a confusing operation to many users.
I think it would be *all* users. I doubt that anyone could predict that the ~ operator converts to int using base 16 just by looking at the expression in isolation. -- Steve

It was written:
How is `int(string, 16)` "inverting"?
It's the inverse of f"{number:x}", of course. Mappings between types are ubiquitous, and (more or less) invertible ones are not uncommon. It's an honest question, but I suggest we let slightly odd usage, especially in scare quotes, pass. It's not essential to understand the proposal. Steve

13.10.21 03:05, MarylandBall Productions пише:
I would think `~string` could be good for a shorthand way to convert a string to an integer, considering you’re “inverting” the string to another type, though a downside to this would be that explicit is always better than implicit and ~string will be a confusing operation to many users.
Then it should be a shorthand for json.loads().

On Wed, Oct 13, 2021 at 10:53 AM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
Better idea: Just don't. These are incredibly arbitrary and have very little association with their symbols. They don't even have the "cute" value of constructing a path by dividing a path by a string, or constructing an email address using matrix multiplication. Don't search for a meaning for some combination of symbols. Start with meaning, and then think about the best way to spell it. In each of these cases, the existing spelling is FAR better than anything involving an operator. ChrisA

El mar, 12 oct 2021 a las 16:51, Jeremiah Vivian (< nohackingofkrowten@gmail.com>) escribió:
Your other post mostly attracted sarcastic replies, so I'll be more direct: It's highly unlikely that this will go anywhere. To get a new operator on a builtin type, you'll have to show that: - It's a common operation; - There's no convenient way to do it already; and - The meaning of the operator is reasonably clear to a reader of the code. Recent examples of new features that met that bar are dict | in https://www.python.org/dev/peps/pep-0584 and matrix multiply in https://www.python.org/dev/peps/pep-0465/. I don't think any of these proposals come close to meeting those criteria.

The idea to use "-" in the context of strings may have some merrit. Not as unary minus, but as sequence operation and shorthand for str.removesuffix(x): s = 'abc' + 'def' - 'ef' + 'gh' giving s == 'abcdgh' Removing suffixes from strings is a rather common operation. Removing prefixes is common as well, so perhaps "~" could be mapped to str.removeprefix(): s = 'abcdef' ~ 'abc' giving s == 'def' In a similar way, "/" could be mapped to str.split(), since that's probably even more common: l = 'a,b,c,d' / ',' giving: l == ['a', 'b', 'c', 'd'] Looking at the examples, I'm not sure how well this would play out in the context of just using variables, though: s = a - s s = a / c s = a ~ p By adding such operators we could potentially make math functions compatible with strings by the way of duck typing, giving some really weird results, instead of errors. Cheers, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 13 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

On Wed, Oct 13, 2021 at 7:57 PM Marc-Andre Lemburg <mal@egenix.com> wrote:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
Less obvious but convenient also. Unfortunately, tilde is a unary operator, so this won't actually work.
Definitely. In any language that supports this, I use it frequently. It should be matched with seq*str to join: ["a", "b", "c","d"] * "," to give "a,b,c,d".
In my experience, there's often one constant and one variable involved, such as: lines = data / "\n" words = line / " " outputfile = inputfile - ".md" + ".html"
Maybe, but I wouldn't consider that to be a particularly high priority. If they work, great, if they don't, so be it. The math module itself is primarily focused on float math, not even int. +1. ChrisA

Chris Angelico writes:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef". Steve

Maybe we should only accept operators as aliases for existing methods. x-y could mean x.removesuffix(y) I don't think x~y is intuitive enough to use. On Wed, Oct 13, 2021 at 8:03 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 13.10.2021 17:11, Guido van Rossum wrote:
Maybe we should only accept operators as aliases for existing methods.
x-y could mean x.removesuffix(y)
That was the idea, yes, in particular to make it similar to "+", which adds to the end of the string, so that: s = x - oldend + newend works as expected.
I don't think x~y is intuitive enough to use.
True. I tried to find an operator that looked similar to "-", but "~" would only work as unary operator, a Chris correctly pointed out, and even if it were a binary one, it would look too similar to "-" and also doesn't play well when used on a single line. s = newstart + (x ~ oldstart) So I withdraw that proposal.
Professional Python Services directly from the Experts (#1, Oct 13 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

On 2021-10-13 16:26, Marc-Andre Lemburg wrote:
From a mathematical point of view, x-y is equivalent to x+(-y). That leads me to me "negative" strings that, when added to a string, remove characters instead of adding them. For example: "abcdef" - "ef" == "abcdef" + (-"ef") == "abcd" and also: (-"ab") + "abcdef" == "cdef" Voilà! An alternative to .removeprefix. :-)

MRAB wrote:
From a mathematical point of view, x-y is equivalent to x+(-y).
From a mathematical point of view, x+y is equivalent to y+x, but I suppose that ship has sailed a long long time ago. ("++", "--", etc. would have been better choices for operators)[*] Anyway, if you're going to add these operators for strings, I suggest also adding them for lists. And also make them work with the new match syntax. [*] Floating point x+y isn't always y+x, but floating point is its own problematic world.

Peter Ludemann writes:
From a mathematical point of view, x+y is equivalent to y+x,
Yes, other things being equal, mathematical purists would prefer '*' to '+' for string concatenation *because* it's not commutative, but they're not equal. I imagine a good majority of folks can guess what '" " * 8' and '"foot" + "ball"' mean, but they'd have a lot of trouble even with '"two" * "two"' and '" " + 8'.
"++", "--", etc. would have been better choices for operators
I don't think I agree that proliferating operator symbols for different types is a better idea. I think making judicious choices for overloading (with the default being Just Say No) is the best we're going to get. Steve

On Wed, Oct 13, 2021, 11:01 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote: Chris Angelico writes:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef". Steve Maybe it should be "remove the first one" rather than a suffix. Then for removal of multiple substrings, you could allow the RHS to be a sequence of substrings:
BTW these are interesting ideas and I'm just exploring the possibilities. Not proposing or agreeing with anything.

On 2021-10-14 at 00:00:25 +0900, "Stephen J. Turnbull" <stephenjturnbull@gmail.com> wrote:
I don't know whether it qualifies as prior art, but in Erlang (a language emphatically *not* known for its string handling), strings are lists of codepoints, and the list subtraction operator¹ is spelled "--": The list subtraction operator -- produces a list that is a copy of the first argument. The procedure is a follows: for each element in the second argument, the first occurrence of this element (if any) is removed. Example: 2> [1,2,3,2,1,2]--[2,1,2]. [3,1,2] And from my interactive prompt: 4> "abcdef" -- "ef". "abcd" 5> "abcdef" -- "ab". "cdef" ¹ http://erlang.org/doc/reference_manual/expressions.html#list-operations

On Thu, Oct 14, 2021 at 2:21 AM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
It definitely counts as prior art. Another language that allows string subtraction is Pike:
"abcdef" - "ab"; (1) Result: "cdef"
I don't think it's quite such a foot-gun as you might think; but on the other hand, I'm also quite happy to see string subtraction defined in terms of removesuffix, since that IS a closer inverse to string addition. It'd end up being one of those cases where both behaviours are useful, and I'd just have to change gears when coding in multiple languages. (Which has to happen anyway. There's plenty of little differences, like modulo with negative numbers.) ChrisA

On 2021-10-14 at 04:34:24 +1100, Chris Angelico <rosuav@gmail.com> wrote:
The footgun is coding in multiple languages. You're in a maze of twisty little passages, all different. ;-) Just remember to use your clutch when you change gears. So aside from filename extensions, what are the real use cases for suffix removal? Plurals? No, too locale-dependent and too many exceptions. Whitespace left over from external data? No, there's already other functions for that (and regexen and actual parsers if they're not good enough). Directory traversal? No, that's what path instances and the os module are for.

On Wed, 13 Oct 2021 at 19:02, <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
I think this is a good point. Is removesuffix really useful enough to warrant having an operator *as well as* a string method? It was only added in 3.9, so we've been managing without it at all for years, after all... Paul

On 13.10.2021 20:47, Paul Moore wrote:
Sure, but that's not evidence that this kind of operation is not common. Some examples: - removal of file extensions - removal of end tags - removal of units - removal of currencies - removal of standard suffixes - removal of wildcard patterns etc. I find lots of such uses in the code bases I work with. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 13 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

13.10.21 22:03, Marc-Andre Lemburg пише:
I did not have opportunity to use removesuffix yet. The problem is that almost always when I need to remove a suffix or prefix, I need to know whether it was removed or no. removesuffix does not help, it does not even make the code shorter. And I need to support versions older than 3.9.

Greetings list, Looking at the examples, I'm not sure how well this would play out in the context of just using variables, though: s = a - s s = a / c s = a ~ p By adding such operators we could potentially make math functions compatible with strings by the way of duck typing, giving some really weird results, instead of errors. Well, i think that since everything in Python is an object, this example can apply to anything. Like you need to put it in context. It's not a problem per se. Kind Regards, Abdur-Rahmaan Janhangeer about <https://compileralchemy.github.io/> | blog <https://www.pythonkitchen.com> github <https://github.com/Abdur-RahmaanJ> Mauritius

On Tue, Oct 12, 2021 at 05:10:45PM -0700, Jelle Zijlstra wrote:
Your other post mostly attracted sarcastic replies, so I'll be more direct: It's highly unlikely that this will go anywhere.
Jelle, the second part of your sentence may be true, but the first part is not. It is unfair and inaccurate to say that the other post "mostly" attracted sarcastic replies. There was exactly one post that used sarcasm, by Ricky, and that was not biting or aggressive sarcasm, but just a bit of humour: that using unary operators for swapcase and titlecase could save a lot of typing. Ironically, Ricky's in-fun suggestion that we use the tilde operator for swapcase was the only suggestion in these two threads that actually met the invariant for an inverse that ~~x == x. -- Steve

On Wed, Oct 20, 2021 at 11:10:52AM +1100, Chris Angelico wrote:
Hah, well spotted! Ironically, there is an uppercase eszett, 'ẞ', although font support for it may still be limited. (Come on font designers, it has only been official in Unicode since 2008 and in German orthography in 2017). -- Steve

I'm here all week. Tip your wait staff. Also, genuine apologies if mine was perceived as mean-sarcastic. It was definitely sarcastic but I hoped it was fun enough in tone not to seem mean-spirited. I apologize sincerely and without reservation and I would do it better next time. :) On Tue, Oct 19, 2021, 8:35 PM Steven D'Aprano <steve@pearwood.info> wrote:

On Wed, Oct 20, 2021 at 11:35 AM Steven D'Aprano <steve@pearwood.info> wrote:
Yes (and it shows up fine in both my web browser and my terminals), but that only makes swapcase worse.
Fortunately, you can always rely on casefold to make things consistent:
"ẞ".casefold() == "ß".casefold() == "SS".casefold() == "ss".casefold() True
TBH swapcase is a bit of a minefield if you don't know what language you're working with.
"Iİıi".swapcase() 'ii̇II'
I'm not sure I've ever used it in production. Normally it's just upper(), lower(), or title() for conversions, and casefold() for comparisons. The most logical "negation" of a string would be reversing it, which WOULD be... well, reversible. But that doesn't need an operator, since it already has slice notation. ChrisA

On Wed, Oct 20, 2021 at 12:02 PM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
True, but since all of Python's indexing, slicing, etc, is defined by codepoints, what that really means is that slicing/reversing a string can cause peculiar behaviours. But that's true of many other types of characters too; if you mix LTR and RTL text, with some directionless text in between, you'll see some very peculiar behaviour when you reverse it (try getting an English word, some ASCII digits, and an Arabic word - the digits are the same ones that both languages use). So I don't think combining characters are unique here. "Reversing" text means many different things depending on context. ChrisA

On Wed, Oct 20, 2021 at 11:30:50AM +0900, Stephen J. Turnbull wrote:
Yes I did, Turkish (and a couple of other Turkic languages) have dotted and dottless I, and their case rules differ from the default Unicode rules.
The Unicode titlecase algorithm is used for languages where digraphs like Lj, Nj, or Dz are classified as single letters of the alphabet. For example, the Polish word Dziewczyna uses the Dz digraph as the first letter. If you use the Unicode code point U+01F3 we get: 'dziewczyna'.upper() # returns 'DZIEWCZYNA' 'dziewczyna'.title() # returns 'Dziewczyna' In case the glyphs don't show up for you, they are dz for the lowercase, DZ for the uppercase, and Dz for the titlecase. swapcase() doesn't use titlecase. Titlecased characters remain unchanged when you use swapcase() on them. -- Steve

On Tue, Oct 12, 2021 at 5:21 PM Jeremiah Vivian < nohackingofkrowten@gmail.com> wrote:
So I guess I'll just have to keep this to myself.
I know this is disappointing, but in this case I agree with Jelle -- this particular idea does not fit well in Python's design, it looks like an attempt at saving one opcode (or a few characters to type) for a relatively rare use case. However, I recommend that you don't give up! There are many ways Python can still use improvement, and a few more attempts will help you calibrate your ideas with what might be acceptable. I would also like to remind various other posters that sarcasm is *not* a good way to welcome newbies. The name of the list is python-ideas, not python-ideas-to-shoot-down-sarcastically. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Tue, Oct 12, 2021 at 05:30:13PM -0700, Guido van Rossum wrote:
Guido, it isn't fair of you to jump into this thread and start scolding us for being sarcastic. Jelle's accusation that we "mostly" replied to Jerimiah with sarcasm was inaccurate. The regulars here (especially Chris) spent a lot of time and effort trying to get Jerimiah to understand the need to justify the association between ord(), or some other arbitrary function, with the unary plus operator (or any other arbitrary operator). It is not nice to dismiss that effort as sarcastically shooting the idea down. -- Steve

On Tue, Oct 12, 2021 at 11:50:27PM -0000, Jeremiah Vivian wrote:
Did you actually read people's comments in that other thread? If you did read that thread, you should have understood that before anyone takes your proposal seriously, you **must** justify why the unary operators should do what you want them to do. So far, you have suggested: +"s" == ord("s") +string = int(string) +string = string.lstrip() with no justification for any of these beyond an *assumption* that calling the function ord() might be slower than using a unary operator. (That might be true, maybe, but I doubt it would be significantly slower.) The point of my earlier email was to make it clear to you how random and arbitrary the choice of ord() for unary plus was, not to convince you to choose a different random and arbitrary choice. What part of `+string` do you think CLEARLY and OBVIOUSLY means "convert the string to an int? Why would `-string` mean "convert to an int using octal" and `~string` mean base 16? Why not the other way? Why map `+string` to lstrip() and `-string` to rstrip() instead of the other way? All these choices seem random and arbitrary. You have still not posted any solid justification for why strings should support unary operators. You haven't even said "because I'm lazy and don't want to type a function name". At least that would be a reason. A bad reason, but still better than no reason at all. So let me be completely frank: - I think you have zero chance of this proposal being accepted. - But if you were to have *any* chance at all, even one in a hundred million, you need to start by giving some good reasons why unary operators should be used for strings at all. - You need to justify the choices. What part of `~string` will make the average Python programmer think of converting to an int in hex, or striping whitespace? -- Steve

Now I didn't expect this thread to blow up in replies with alternatives, specifically `str1 / str2` for 'str1.split(str2)' and `seq1 * str` for 'str.join(seq1)'.

I would think `~string` could be good for a shorthand way to convert a string to an integer, considering you’re “inverting” the string to another type, though a downside to this would be that explicit is always better than implicit and ~string will be a confusing operation to many users.

On Wed, Oct 13, 2021 at 11:21 AM MarylandBall Productions <shayaanwadkar@gmail.com> wrote:
I would think `~string` could be good for a shorthand way to convert a string to an integer, considering you’re “inverting” the string to another type, though a downside to this would be that explicit is always better than implicit and ~string will be a confusing operation to many users.
Can you give another example of where you invert something into another type? I don't understand the analogy you're making here. ChrisA

On Wed, Oct 13, 2021 at 12:05:35AM -0000, MarylandBall Productions wrote:
How is `int(string, 16)` "inverting"? Inverting means to flip or reverse, not to convert to another type. Even if it did, why convert to an int, rather than a float, or a list, or bytes, or any other type? You could maybe make an argument for ~"Hello" to invert it by reversing horizontally "olleH" but even that would be pretty weak.
~string will be a confusing operation to many users.
I think it would be *all* users. I doubt that anyone could predict that the ~ operator converts to int using base 16 just by looking at the expression in isolation. -- Steve

It was written:
How is `int(string, 16)` "inverting"?
It's the inverse of f"{number:x}", of course. Mappings between types are ubiquitous, and (more or less) invertible ones are not uncommon. It's an honest question, but I suggest we let slightly odd usage, especially in scare quotes, pass. It's not essential to understand the proposal. Steve

13.10.21 03:05, MarylandBall Productions пише:
I would think `~string` could be good for a shorthand way to convert a string to an integer, considering you’re “inverting” the string to another type, though a downside to this would be that explicit is always better than implicit and ~string will be a confusing operation to many users.
Then it should be a shorthand for json.loads().

On Wed, Oct 13, 2021 at 10:53 AM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
Better idea: Just don't. These are incredibly arbitrary and have very little association with their symbols. They don't even have the "cute" value of constructing a path by dividing a path by a string, or constructing an email address using matrix multiplication. Don't search for a meaning for some combination of symbols. Start with meaning, and then think about the best way to spell it. In each of these cases, the existing spelling is FAR better than anything involving an operator. ChrisA

El mar, 12 oct 2021 a las 16:51, Jeremiah Vivian (< nohackingofkrowten@gmail.com>) escribió:
Your other post mostly attracted sarcastic replies, so I'll be more direct: It's highly unlikely that this will go anywhere. To get a new operator on a builtin type, you'll have to show that: - It's a common operation; - There's no convenient way to do it already; and - The meaning of the operator is reasonably clear to a reader of the code. Recent examples of new features that met that bar are dict | in https://www.python.org/dev/peps/pep-0584 and matrix multiply in https://www.python.org/dev/peps/pep-0465/. I don't think any of these proposals come close to meeting those criteria.

The idea to use "-" in the context of strings may have some merrit. Not as unary minus, but as sequence operation and shorthand for str.removesuffix(x): s = 'abc' + 'def' - 'ef' + 'gh' giving s == 'abcdgh' Removing suffixes from strings is a rather common operation. Removing prefixes is common as well, so perhaps "~" could be mapped to str.removeprefix(): s = 'abcdef' ~ 'abc' giving s == 'def' In a similar way, "/" could be mapped to str.split(), since that's probably even more common: l = 'a,b,c,d' / ',' giving: l == ['a', 'b', 'c', 'd'] Looking at the examples, I'm not sure how well this would play out in the context of just using variables, though: s = a - s s = a / c s = a ~ p By adding such operators we could potentially make math functions compatible with strings by the way of duck typing, giving some really weird results, instead of errors. Cheers, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 13 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

On Wed, Oct 13, 2021 at 7:57 PM Marc-Andre Lemburg <mal@egenix.com> wrote:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
Less obvious but convenient also. Unfortunately, tilde is a unary operator, so this won't actually work.
Definitely. In any language that supports this, I use it frequently. It should be matched with seq*str to join: ["a", "b", "c","d"] * "," to give "a,b,c,d".
In my experience, there's often one constant and one variable involved, such as: lines = data / "\n" words = line / " " outputfile = inputfile - ".md" + ".html"
Maybe, but I wouldn't consider that to be a particularly high priority. If they work, great, if they don't, so be it. The math module itself is primarily focused on float math, not even int. +1. ChrisA

Chris Angelico writes:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef". Steve

Maybe we should only accept operators as aliases for existing methods. x-y could mean x.removesuffix(y) I don't think x~y is intuitive enough to use. On Wed, Oct 13, 2021 at 8:03 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 13.10.2021 17:11, Guido van Rossum wrote:
Maybe we should only accept operators as aliases for existing methods.
x-y could mean x.removesuffix(y)
That was the idea, yes, in particular to make it similar to "+", which adds to the end of the string, so that: s = x - oldend + newend works as expected.
I don't think x~y is intuitive enough to use.
True. I tried to find an operator that looked similar to "-", but "~" would only work as unary operator, a Chris correctly pointed out, and even if it were a binary one, it would look too similar to "-" and also doesn't play well when used on a single line. s = newstart + (x ~ oldstart) So I withdraw that proposal.
Professional Python Services directly from the Experts (#1, Oct 13 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

On 2021-10-13 16:26, Marc-Andre Lemburg wrote:
From a mathematical point of view, x-y is equivalent to x+(-y). That leads me to me "negative" strings that, when added to a string, remove characters instead of adding them. For example: "abcdef" - "ef" == "abcdef" + (-"ef") == "abcd" and also: (-"ab") + "abcdef" == "cdef" Voilà! An alternative to .removeprefix. :-)

MRAB wrote:
From a mathematical point of view, x-y is equivalent to x+(-y).
From a mathematical point of view, x+y is equivalent to y+x, but I suppose that ship has sailed a long long time ago. ("++", "--", etc. would have been better choices for operators)[*] Anyway, if you're going to add these operators for strings, I suggest also adding them for lists. And also make them work with the new match syntax. [*] Floating point x+y isn't always y+x, but floating point is its own problematic world.

Peter Ludemann writes:
From a mathematical point of view, x+y is equivalent to y+x,
Yes, other things being equal, mathematical purists would prefer '*' to '+' for string concatenation *because* it's not commutative, but they're not equal. I imagine a good majority of folks can guess what '" " * 8' and '"foot" + "ball"' mean, but they'd have a lot of trouble even with '"two" * "two"' and '" " + 8'.
"++", "--", etc. would have been better choices for operators
I don't think I agree that proliferating operator symbols for different types is a better idea. I think making judicious choices for overloading (with the default being Just Say No) is the best we're going to get. Steve

On Wed, Oct 13, 2021, 11:01 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote: Chris Angelico writes:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef". Steve Maybe it should be "remove the first one" rather than a suffix. Then for removal of multiple substrings, you could allow the RHS to be a sequence of substrings:
BTW these are interesting ideas and I'm just exploring the possibilities. Not proposing or agreeing with anything.

On 2021-10-14 at 00:00:25 +0900, "Stephen J. Turnbull" <stephenjturnbull@gmail.com> wrote:
I don't know whether it qualifies as prior art, but in Erlang (a language emphatically *not* known for its string handling), strings are lists of codepoints, and the list subtraction operator¹ is spelled "--": The list subtraction operator -- produces a list that is a copy of the first argument. The procedure is a follows: for each element in the second argument, the first occurrence of this element (if any) is removed. Example: 2> [1,2,3,2,1,2]--[2,1,2]. [3,1,2] And from my interactive prompt: 4> "abcdef" -- "ef". "abcd" 5> "abcdef" -- "ab". "cdef" ¹ http://erlang.org/doc/reference_manual/expressions.html#list-operations

On Thu, Oct 14, 2021 at 2:21 AM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
It definitely counts as prior art. Another language that allows string subtraction is Pike:
"abcdef" - "ab"; (1) Result: "cdef"
I don't think it's quite such a foot-gun as you might think; but on the other hand, I'm also quite happy to see string subtraction defined in terms of removesuffix, since that IS a closer inverse to string addition. It'd end up being one of those cases where both behaviours are useful, and I'd just have to change gears when coding in multiple languages. (Which has to happen anyway. There's plenty of little differences, like modulo with negative numbers.) ChrisA

On 2021-10-14 at 04:34:24 +1100, Chris Angelico <rosuav@gmail.com> wrote:
The footgun is coding in multiple languages. You're in a maze of twisty little passages, all different. ;-) Just remember to use your clutch when you change gears. So aside from filename extensions, what are the real use cases for suffix removal? Plurals? No, too locale-dependent and too many exceptions. Whitespace left over from external data? No, there's already other functions for that (and regexen and actual parsers if they're not good enough). Directory traversal? No, that's what path instances and the os module are for.

On Wed, 13 Oct 2021 at 19:02, <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
I think this is a good point. Is removesuffix really useful enough to warrant having an operator *as well as* a string method? It was only added in 3.9, so we've been managing without it at all for years, after all... Paul

On 13.10.2021 20:47, Paul Moore wrote:
Sure, but that's not evidence that this kind of operation is not common. Some examples: - removal of file extensions - removal of end tags - removal of units - removal of currencies - removal of standard suffixes - removal of wildcard patterns etc. I find lots of such uses in the code bases I work with. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 13 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

13.10.21 22:03, Marc-Andre Lemburg пише:
I did not have opportunity to use removesuffix yet. The problem is that almost always when I need to remove a suffix or prefix, I need to know whether it was removed or no. removesuffix does not help, it does not even make the code shorter. And I need to support versions older than 3.9.

Greetings list, Looking at the examples, I'm not sure how well this would play out in the context of just using variables, though: s = a - s s = a / c s = a ~ p By adding such operators we could potentially make math functions compatible with strings by the way of duck typing, giving some really weird results, instead of errors. Well, i think that since everything in Python is an object, this example can apply to anything. Like you need to put it in context. It's not a problem per se. Kind Regards, Abdur-Rahmaan Janhangeer about <https://compileralchemy.github.io/> | blog <https://www.pythonkitchen.com> github <https://github.com/Abdur-RahmaanJ> Mauritius

On Tue, Oct 12, 2021 at 05:10:45PM -0700, Jelle Zijlstra wrote:
Your other post mostly attracted sarcastic replies, so I'll be more direct: It's highly unlikely that this will go anywhere.
Jelle, the second part of your sentence may be true, but the first part is not. It is unfair and inaccurate to say that the other post "mostly" attracted sarcastic replies. There was exactly one post that used sarcasm, by Ricky, and that was not biting or aggressive sarcasm, but just a bit of humour: that using unary operators for swapcase and titlecase could save a lot of typing. Ironically, Ricky's in-fun suggestion that we use the tilde operator for swapcase was the only suggestion in these two threads that actually met the invariant for an inverse that ~~x == x. -- Steve

On Wed, Oct 20, 2021 at 11:10:52AM +1100, Chris Angelico wrote:
Hah, well spotted! Ironically, there is an uppercase eszett, 'ẞ', although font support for it may still be limited. (Come on font designers, it has only been official in Unicode since 2008 and in German orthography in 2017). -- Steve

I'm here all week. Tip your wait staff. Also, genuine apologies if mine was perceived as mean-sarcastic. It was definitely sarcastic but I hoped it was fun enough in tone not to seem mean-spirited. I apologize sincerely and without reservation and I would do it better next time. :) On Tue, Oct 19, 2021, 8:35 PM Steven D'Aprano <steve@pearwood.info> wrote:

On Wed, Oct 20, 2021 at 11:35 AM Steven D'Aprano <steve@pearwood.info> wrote:
Yes (and it shows up fine in both my web browser and my terminals), but that only makes swapcase worse.
Fortunately, you can always rely on casefold to make things consistent:
"ẞ".casefold() == "ß".casefold() == "SS".casefold() == "ss".casefold() True
TBH swapcase is a bit of a minefield if you don't know what language you're working with.
"Iİıi".swapcase() 'ii̇II'
I'm not sure I've ever used it in production. Normally it's just upper(), lower(), or title() for conversions, and casefold() for comparisons. The most logical "negation" of a string would be reversing it, which WOULD be... well, reversible. But that doesn't need an operator, since it already has slice notation. ChrisA

On Wed, Oct 20, 2021 at 12:02 PM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
True, but since all of Python's indexing, slicing, etc, is defined by codepoints, what that really means is that slicing/reversing a string can cause peculiar behaviours. But that's true of many other types of characters too; if you mix LTR and RTL text, with some directionless text in between, you'll see some very peculiar behaviour when you reverse it (try getting an English word, some ASCII digits, and an Arabic word - the digits are the same ones that both languages use). So I don't think combining characters are unique here. "Reversing" text means many different things depending on context. ChrisA

On Wed, Oct 20, 2021 at 11:30:50AM +0900, Stephen J. Turnbull wrote:
Yes I did, Turkish (and a couple of other Turkic languages) have dotted and dottless I, and their case rules differ from the default Unicode rules.
The Unicode titlecase algorithm is used for languages where digraphs like Lj, Nj, or Dz are classified as single letters of the alphabet. For example, the Polish word Dziewczyna uses the Dz digraph as the first letter. If you use the Unicode code point U+01F3 we get: 'dziewczyna'.upper() # returns 'DZIEWCZYNA' 'dziewczyna'.title() # returns 'Dziewczyna' In case the glyphs don't show up for you, they are dz for the lowercase, DZ for the uppercase, and Dz for the titlecase. swapcase() doesn't use titlecase. Titlecased characters remain unchanged when you use swapcase() on them. -- Steve

On Tue, Oct 12, 2021 at 5:21 PM Jeremiah Vivian < nohackingofkrowten@gmail.com> wrote:
So I guess I'll just have to keep this to myself.
I know this is disappointing, but in this case I agree with Jelle -- this particular idea does not fit well in Python's design, it looks like an attempt at saving one opcode (or a few characters to type) for a relatively rare use case. However, I recommend that you don't give up! There are many ways Python can still use improvement, and a few more attempts will help you calibrate your ideas with what might be acceptable. I would also like to remind various other posters that sarcasm is *not* a good way to welcome newbies. The name of the list is python-ideas, not python-ideas-to-shoot-down-sarcastically. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Tue, Oct 12, 2021 at 05:30:13PM -0700, Guido van Rossum wrote:
Guido, it isn't fair of you to jump into this thread and start scolding us for being sarcastic. Jelle's accusation that we "mostly" replied to Jerimiah with sarcasm was inaccurate. The regulars here (especially Chris) spent a lot of time and effort trying to get Jerimiah to understand the need to justify the association between ord(), or some other arbitrary function, with the unary plus operator (or any other arbitrary operator). It is not nice to dismiss that effort as sarcastically shooting the idea down. -- Steve

On Tue, Oct 12, 2021 at 11:50:27PM -0000, Jeremiah Vivian wrote:
Did you actually read people's comments in that other thread? If you did read that thread, you should have understood that before anyone takes your proposal seriously, you **must** justify why the unary operators should do what you want them to do. So far, you have suggested: +"s" == ord("s") +string = int(string) +string = string.lstrip() with no justification for any of these beyond an *assumption* that calling the function ord() might be slower than using a unary operator. (That might be true, maybe, but I doubt it would be significantly slower.) The point of my earlier email was to make it clear to you how random and arbitrary the choice of ord() for unary plus was, not to convince you to choose a different random and arbitrary choice. What part of `+string` do you think CLEARLY and OBVIOUSLY means "convert the string to an int? Why would `-string` mean "convert to an int using octal" and `~string` mean base 16? Why not the other way? Why map `+string` to lstrip() and `-string` to rstrip() instead of the other way? All these choices seem random and arbitrary. You have still not posted any solid justification for why strings should support unary operators. You haven't even said "because I'm lazy and don't want to type a function name". At least that would be a reason. A bad reason, but still better than no reason at all. So let me be completely frank: - I think you have zero chance of this proposal being accepted. - But if you were to have *any* chance at all, even one in a hundred million, you need to start by giving some good reasons why unary operators should be used for strings at all. - You need to justify the choices. What part of `~string` will make the average Python programmer think of converting to an int in hex, or striping whitespace? -- Steve

Now I didn't expect this thread to blow up in replies with alternatives, specifically `str1 / str2` for 'str1.split(str2)' and `seq1 * str` for 'str.join(seq1)'.
participants (15)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
Abdur-Rahmaan Janhangeer
-
Chris Angelico
-
Guido van Rossum
-
Jelle Zijlstra
-
Jeremiah Vivian
-
Marc-Andre Lemburg
-
MarylandBall Productions
-
MRAB
-
Paul Moore
-
Peter Ludemann
-
Ricky Teachey
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano