Implementing string unary operators
I posted a previous thread about overloading the unary `+` operator in strings with `ord`, and that expanded to more than just the unary `+` operator. So I'm saying now, there should be these implementations:
+string - `int(string, 10)` (or just `int(string)`) -string - `int(string, 8)` ~string - `int(string, 16)`
Or:
+string - `string.lstrip()` -string - `string.rstrip()` ~string - `string.strip()`
If anyone has better ideas, they can post it here.
I would think `~string` could be good for a shorthand way to convert a string to an integer, considering you’re “inverting” the string to another type, though a downside to this would be that explicit is always better than implicit and ~string will be a confusing operation to many users.
On Wed, Oct 13, 2021 at 11:21 AM MarylandBall Productions <shayaanwadkar@gmail.com> wrote:
I would think `~string` could be good for a shorthand way to convert a string to an integer, considering you’re “inverting” the string to another type, though a downside to this would be that explicit is always better than implicit and ~string will be a confusing operation to many users.
Can you give another example of where you invert something into another type? I don't understand the analogy you're making here. ChrisA
On Wed, Oct 13, 2021 at 12:05:35AM -0000, MarylandBall Productions wrote:
I would think `~string` could be good for a shorthand way to convert a string to an integer, considering you’re “inverting” the string to another type
How is `int(string, 16)` "inverting"? Inverting means to flip or reverse, not to convert to another type. Even if it did, why convert to an int, rather than a float, or a list, or bytes, or any other type? You could maybe make an argument for ~"Hello" to invert it by reversing horizontally "olleH" but even that would be pretty weak.
~string will be a confusing operation to many users.
I think it would be *all* users. I doubt that anyone could predict that the ~ operator converts to int using base 16 just by looking at the expression in isolation. -- Steve
It was written:
How is `int(string, 16)` "inverting"?
It's the inverse of f"{number:x}", of course. Mappings between types are ubiquitous, and (more or less) invertible ones are not uncommon. It's an honest question, but I suggest we let slightly odd usage, especially in scare quotes, pass. It's not essential to understand the proposal. Steve
13.10.21 03:05, MarylandBall Productions пише:
I would think `~string` could be good for a shorthand way to convert a string to an integer, considering you’re “inverting” the string to another type, though a downside to this would be that explicit is always better than implicit and ~string will be a confusing operation to many users.
Then it should be a shorthand for json.loads().
On Wed, Oct 13, 2021 at 10:53 AM Jeremiah Vivian <nohackingofkrowten@gmail.com> wrote:
I posted a previous thread about overloading the unary `+` operator in strings with `ord`, and that expanded to more than just the unary `+` operator. So I'm saying now, there should be these implementations:
+string - `int(string, 10)` (or just `int(string)`) -string - `int(string, 8)` ~string - `int(string, 16)`
Or:
+string - `string.lstrip()` -string - `string.rstrip()` ~string - `string.strip()`
If anyone has better ideas, they can post it here.
Better idea: Just don't. These are incredibly arbitrary and have very little association with their symbols. They don't even have the "cute" value of constructing a path by dividing a path by a string, or constructing an email address using matrix multiplication. Don't search for a meaning for some combination of symbols. Start with meaning, and then think about the best way to spell it. In each of these cases, the existing spelling is FAR better than anything involving an operator. ChrisA
El mar, 12 oct 2021 a las 16:51, Jeremiah Vivian (< nohackingofkrowten@gmail.com>) escribió:
I posted a previous thread about overloading the unary `+` operator in strings with `ord`, and that expanded to more than just the unary `+` operator. So I'm saying now, there should be these implementations:
+string - `int(string, 10)` (or just `int(string)`) -string - `int(string, 8)` ~string - `int(string, 16)`
Or:
+string - `string.lstrip()` -string - `string.rstrip()` ~string - `string.strip()`
If anyone has better ideas, they can post it here.
Your other post mostly attracted sarcastic replies, so I'll be more direct: It's highly unlikely that this will go anywhere. To get a new operator on a builtin type, you'll have to show that: - It's a common operation; - There's no convenient way to do it already; and - The meaning of the operator is reasonably clear to a reader of the code. Recent examples of new features that met that bar are dict | in https://www.python.org/dev/peps/pep-0584 and matrix multiply in https://www.python.org/dev/peps/pep-0465/. I don't think any of these proposals come close to meeting those criteria.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/E2NCFN... Code of Conduct: http://python.org/psf/codeofconduct/
13.10.21 03:10, Jelle Zijlstra пише:
To get a new operator on a builtin type, you'll have to show that: - It's a common operation; - There's no convenient way to do it already; and - The meaning of the operator is reasonably clear to a reader of the code.
Recent examples of new features that met that bar are dict | in https://www.python.org/dev/peps/pep-0584 <https://www.python.org/dev/peps/pep-0584> and matrix multiply in https://www.python.org/dev/peps/pep-0465/ <https://www.python.org/dev/peps/pep-0465/>.
I think it fails two first criteria. It is not enough common operation and we already did have convenient ways to do it.
The idea to use "-" in the context of strings may have some merrit. Not as unary minus, but as sequence operation and shorthand for str.removesuffix(x): s = 'abc' + 'def' - 'ef' + 'gh' giving s == 'abcdgh' Removing suffixes from strings is a rather common operation. Removing prefixes is common as well, so perhaps "~" could be mapped to str.removeprefix(): s = 'abcdef' ~ 'abc' giving s == 'def' In a similar way, "/" could be mapped to str.split(), since that's probably even more common: l = 'a,b,c,d' / ',' giving: l == ['a', 'b', 'c', 'd'] Looking at the examples, I'm not sure how well this would play out in the context of just using variables, though: s = a - s s = a / c s = a ~ p By adding such operators we could potentially make math functions compatible with strings by the way of duck typing, giving some really weird results, instead of errors. Cheers, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 13 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
On Wed, Oct 13, 2021 at 7:57 PM Marc-Andre Lemburg <mal@egenix.com> wrote:
The idea to use "-" in the context of strings may have some merrit. Not as unary minus, but as sequence operation and shorthand for str.removesuffix(x):
s = 'abc' + 'def' - 'ef' + 'gh'
giving
s == 'abcdgh'
Removing suffixes from strings is a rather common operation.
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
Removing prefixes is common as well, so perhaps "~" could be mapped to str.removeprefix():
s = 'abcdef' ~ 'abc'
giving
s == 'def'
Less obvious but convenient also. Unfortunately, tilde is a unary operator, so this won't actually work.
In a similar way, "/" could be mapped to str.split(), since that's probably even more common:
l = 'a,b,c,d' / ','
giving:
l == ['a', 'b', 'c', 'd']
Definitely. In any language that supports this, I use it frequently. It should be matched with seq*str to join: ["a", "b", "c","d"] * "," to give "a,b,c,d".
Looking at the examples, I'm not sure how well this would play out in the context of just using variables, though:
s = a - s s = a / c s = a ~ p
In my experience, there's often one constant and one variable involved, such as: lines = data / "\n" words = line / " " outputfile = inputfile - ".md" + ".html"
By adding such operators we could potentially make math functions compatible with strings by the way of duck typing, giving some really weird results, instead of errors.
Maybe, but I wouldn't consider that to be a particularly high priority. If they work, great, if they don't, so be it. The math module itself is primarily focused on float math, not even int. +1. ChrisA
Chris Angelico writes:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef". Steve
Maybe we should only accept operators as aliases for existing methods. x-y could mean x.removesuffix(y) I don't think x~y is intuitive enough to use. On Wed, Oct 13, 2021 at 8:03 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote:
Chris Angelico writes:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef".
Steve
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UGRD4Q... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On 13.10.2021 17:11, Guido van Rossum wrote:
Maybe we should only accept operators as aliases for existing methods.
x-y could mean x.removesuffix(y)
That was the idea, yes, in particular to make it similar to "+", which adds to the end of the string, so that: s = x - oldend + newend works as expected.
I don't think x~y is intuitive enough to use.
True. I tried to find an operator that looked similar to "-", but "~" would only work as unary operator, a Chris correctly pointed out, and even if it were a binary one, it would look too similar to "-" and also doesn't play well when used on a single line. s = newstart + (x ~ oldstart) So I withdraw that proposal.
On Wed, Oct 13, 2021 at 8:03 AM Stephen J. Turnbull <stephenjturnbull@gmail.com <mailto:stephenjturnbull@gmail.com>> wrote:
Chris Angelico writes:
> +1, although it's debatable whether it should be remove suffix or > remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef".
Steve -- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, Oct 13 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
On 2021-10-13 16:26, Marc-Andre Lemburg wrote:
On 13.10.2021 17:11, Guido van Rossum wrote:
Maybe we should only accept operators as aliases for existing methods.
x-y could mean x.removesuffix(y)
That was the idea, yes, in particular to make it similar to "+", which adds to the end of the string, so that:
s = x - oldend + newend
works as expected.
I don't think x~y is intuitive enough to use.
True.
I tried to find an operator that looked similar to "-", but "~" would only work as unary operator, a Chris correctly pointed out, and even if it were a binary one, it would look too similar to "-" and also doesn't play well when used on a single line.
s = newstart + (x ~ oldstart)
So I withdraw that proposal.
From a mathematical point of view, x-y is equivalent to x+(-y). That leads me to me "negative" strings that, when added to a string, remove characters instead of adding them. For example: "abcdef" - "ef" == "abcdef" + (-"ef") == "abcd" and also: (-"ab") + "abcdef" == "cdef" Voilà! An alternative to .removeprefix. :-)
On Wed, Oct 13, 2021 at 8:03 AM Stephen J. Turnbull <stephenjturnbull@gmail.com <mailto:stephenjturnbull@gmail.com>> wrote:
Chris Angelico writes:
> +1, although it's debatable whether it should be remove suffix or > remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef".
Steve
MRAB wrote:
From a mathematical point of view, x-y is equivalent to x+(-y).
From a mathematical point of view, x+y is equivalent to y+x, but I suppose that ship has sailed a long long time ago. ("++", "--", etc. would have been better choices for operators)[*] Anyway, if you're going to add these operators for strings, I suggest also adding them for lists. And also make them work with the new match syntax. [*] Floating point x+y isn't always y+x, but floating point is its own problematic world.
Peter Ludemann writes:
From a mathematical point of view, x+y is equivalent to y+x,
Yes, other things being equal, mathematical purists would prefer '*' to '+' for string concatenation *because* it's not commutative, but they're not equal. I imagine a good majority of folks can guess what '" " * 8' and '"foot" + "ball"' mean, but they'd have a lot of trouble even with '"two" * "two"' and '" " + 8'.
"++", "--", etc. would have been better choices for operators
I don't think I agree that proliferating operator symbols for different types is a better idea. I think making judicious choices for overloading (with the default being Just Say No) is the best we're going to get. Steve
On Wed, Oct 13, 2021, 11:01 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote: Chris Angelico writes:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef". Steve Maybe it should be "remove the first one" rather than a suffix. Then for removal of multiple substrings, you could allow the RHS to be a sequence of substrings:
"efaxefef" - "ef" "axefef" "efaxefef" - ["ef", "ax"] "efef" "efaxefef" - ["ef"]*3 "ax"
BTW these are interesting ideas and I'm just exploring the possibilities. Not proposing or agreeing with anything.
On 2021-10-14 at 00:00:25 +0900, "Stephen J. Turnbull" <stephenjturnbull@gmail.com> wrote:
Chris Angelico writes:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef".
I don't know whether it qualifies as prior art, but in Erlang (a language emphatically *not* known for its string handling), strings are lists of codepoints, and the list subtraction operator¹ is spelled "--": The list subtraction operator -- produces a list that is a copy of the first argument. The procedure is a follows: for each element in the second argument, the first occurrence of this element (if any) is removed. Example: 2> [1,2,3,2,1,2]--[2,1,2]. [3,1,2] And from my interactive prompt: 4> "abcdef" -- "ef". "abcd" 5> "abcdef" -- "ab". "cdef" ¹ http://erlang.org/doc/reference_manual/expressions.html#list-operations
On Thu, Oct 14, 2021 at 2:21 AM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
On 2021-10-14 at 00:00:25 +0900, "Stephen J. Turnbull" <stephenjturnbull@gmail.com> wrote:
Chris Angelico writes:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef".
I don't know whether it qualifies as prior art, but in Erlang (a language emphatically *not* known for its string handling), strings are lists of codepoints, and the list subtraction operator¹ is spelled "--":
The list subtraction operator -- produces a list that is a copy of the first argument. The procedure is a follows: for each element in the second argument, the first occurrence of this element (if any) is removed.
Example:
2> [1,2,3,2,1,2]--[2,1,2]. [3,1,2]
And from my interactive prompt:
4> "abcdef" -- "ef". "abcd" 5> "abcdef" -- "ab". "cdef"
¹ http://erlang.org/doc/reference_manual/expressions.html#list-operations
It definitely counts as prior art. Another language that allows string subtraction is Pike:
"abcdef" - "ab"; (1) Result: "cdef"
I don't think it's quite such a foot-gun as you might think; but on the other hand, I'm also quite happy to see string subtraction defined in terms of removesuffix, since that IS a closer inverse to string addition. It'd end up being one of those cases where both behaviours are useful, and I'd just have to change gears when coding in multiple languages. (Which has to happen anyway. There's plenty of little differences, like modulo with negative numbers.) ChrisA
On 2021-10-14 at 04:34:24 +1100, Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Oct 14, 2021 at 2:21 AM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
On 2021-10-14 at 00:00:25 +0900, "Stephen J. Turnbull" <stephenjturnbull@gmail.com> wrote:
Chris Angelico writes:
+1, although it's debatable whether it should be remove suffix or remove all. I'd be happy with either.
If by "remove all" you mean "efefef" - "ef" == "", I think that's a footgun. Similarly for "efabcd" - "ef" == "abcdef" - "ef".
I don't know whether it qualifies as prior art, but in Erlang (a language emphatically *not* known for its string handling), strings are lists of codepoints, and the list subtraction operator¹ is spelled "--":
The list subtraction operator -- produces a list that is a copy of the first argument. The procedure is a follows: for each element in the second argument, the first occurrence of this element (if any) is removed.
Example:
2> [1,2,3,2,1,2]--[2,1,2]. [3,1,2]
And from my interactive prompt:
4> "abcdef" -- "ef". "abcd" 5> "abcdef" -- "ab". "cdef"
¹ http://erlang.org/doc/reference_manual/expressions.html#list-operations
It definitely counts as prior art. Another language that allows string subtraction is Pike:
"abcdef" - "ab"; (1) Result: "cdef"
I don't think it's quite such a foot-gun as you might think; but on the other hand, I'm also quite happy to see string subtraction defined in terms of removesuffix, since that IS a closer inverse to string addition. It'd end up being one of those cases where both behaviours are useful, and I'd just have to change gears when coding in multiple languages. (Which has to happen anyway. There's plenty of little differences, like modulo with negative numbers.)
The footgun is coding in multiple languages. You're in a maze of twisty little passages, all different. ;-) Just remember to use your clutch when you change gears. So aside from filename extensions, what are the real use cases for suffix removal? Plurals? No, too locale-dependent and too many exceptions. Whitespace left over from external data? No, there's already other functions for that (and regexen and actual parsers if they're not good enough). Directory traversal? No, that's what path instances and the os module are for.
On Wed, 13 Oct 2021 at 19:02, <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
So aside from filename extensions, what are the real use cases for suffix removal? Plurals? No, too locale-dependent and too many exceptions. Whitespace left over from external data? No, there's already other functions for that (and regexen and actual parsers if they're not good enough). Directory traversal? No, that's what path instances and the os module are for.
I think this is a good point. Is removesuffix really useful enough to warrant having an operator *as well as* a string method? It was only added in 3.9, so we've been managing without it at all for years, after all... Paul
On 13.10.2021 20:47, Paul Moore wrote:
On Wed, 13 Oct 2021 at 19:02, <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
So aside from filename extensions, what are the real use cases for suffix removal? Plurals? No, too locale-dependent and too many exceptions. Whitespace left over from external data? No, there's already other functions for that (and regexen and actual parsers if they're not good enough). Directory traversal? No, that's what path instances and the os module are for.
I think this is a good point. Is removesuffix really useful enough to warrant having an operator *as well as* a string method? It was only added in 3.9, so we've been managing without it at all for years, after all...
Sure, but that's not evidence that this kind of operation is not common. Some examples: - removal of file extensions - removal of end tags - removal of units - removal of currencies - removal of standard suffixes - removal of wildcard patterns etc. I find lots of such uses in the code bases I work with. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 13 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
13.10.21 22:03, Marc-Andre Lemburg пише:
Some examples: - removal of file extensions - removal of end tags - removal of units - removal of currencies - removal of standard suffixes - removal of wildcard patterns etc.
I find lots of such uses in the code bases I work with.
I did not have opportunity to use removesuffix yet. The problem is that almost always when I need to remove a suffix or prefix, I need to know whether it was removed or no. removesuffix does not help, it does not even make the code shorter. And I need to support versions older than 3.9.
Greetings list, Looking at the examples, I'm not sure how well this would play out in the context of just using variables, though: s = a - s s = a / c s = a ~ p By adding such operators we could potentially make math functions compatible with strings by the way of duck typing, giving some really weird results, instead of errors. Well, i think that since everything in Python is an object, this example can apply to anything. Like you need to put it in context. It's not a problem per se. Kind Regards, Abdur-Rahmaan Janhangeer about <https://compileralchemy.github.io/> | blog <https://www.pythonkitchen.com> github <https://github.com/Abdur-RahmaanJ> Mauritius
On Tue, Oct 12, 2021 at 05:10:45PM -0700, Jelle Zijlstra wrote:
Your other post mostly attracted sarcastic replies, so I'll be more direct: It's highly unlikely that this will go anywhere.
Jelle, the second part of your sentence may be true, but the first part is not. It is unfair and inaccurate to say that the other post "mostly" attracted sarcastic replies. There was exactly one post that used sarcasm, by Ricky, and that was not biting or aggressive sarcasm, but just a bit of humour: that using unary operators for swapcase and titlecase could save a lot of typing. Ironically, Ricky's in-fun suggestion that we use the tilde operator for swapcase was the only suggestion in these two threads that actually met the invariant for an inverse that ~~x == x. -- Steve
On Wed, Oct 20, 2021 at 11:02 AM Steven D'Aprano <steve@pearwood.info> wrote:
Ironically, Ricky's in-fun suggestion that we use the tilde operator for swapcase was the only suggestion in these two threads that actually met the invariant for an inverse that ~~x == x.
x = "ß"
:) Okay, so it's *mostly* an invariant. ChrisA
On Wed, Oct 20, 2021 at 11:10:52AM +1100, Chris Angelico wrote:
On Wed, Oct 20, 2021 at 11:02 AM Steven D'Aprano <steve@pearwood.info> wrote:
Ironically, Ricky's in-fun suggestion that we use the tilde operator for swapcase was the only suggestion in these two threads that actually met the invariant for an inverse that ~~x == x.
x = "ß"
:) Okay, so it's *mostly* an invariant.
Hah, well spotted! Ironically, there is an uppercase eszett, 'ẞ', although font support for it may still be limited. (Come on font designers, it has only been official in Unicode since 2008 and in German orthography in 2017). -- Steve
I'm here all week. Tip your wait staff. Also, genuine apologies if mine was perceived as mean-sarcastic. It was definitely sarcastic but I hoped it was fun enough in tone not to seem mean-spirited. I apologize sincerely and without reservation and I would do it better next time. :) On Tue, Oct 19, 2021, 8:35 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 20, 2021 at 11:10:52AM +1100, Chris Angelico wrote:
On Wed, Oct 20, 2021 at 11:02 AM Steven D'Aprano <steve@pearwood.info> wrote:
Ironically, Ricky's in-fun suggestion that we use the tilde operator for swapcase was the only suggestion in these two threads that actually met the invariant for an inverse that ~~x == x.
x = "ß"
:) Okay, so it's *mostly* an invariant.
Hah, well spotted!
Ironically, there is an uppercase eszett, 'ẞ', although font support for it may still be limited. (Come on font designers, it has only been official in Unicode since 2008 and in German orthography in 2017).
-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BSSSU5... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Oct 20, 2021 at 11:35 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 20, 2021 at 11:10:52AM +1100, Chris Angelico wrote:
On Wed, Oct 20, 2021 at 11:02 AM Steven D'Aprano <steve@pearwood.info> wrote:
Ironically, Ricky's in-fun suggestion that we use the tilde operator for swapcase was the only suggestion in these two threads that actually met the invariant for an inverse that ~~x == x.
x = "ß"
:) Okay, so it's *mostly* an invariant.
Hah, well spotted!
Ironically, there is an uppercase eszett, 'ẞ', although font support for it may still be limited. (Come on font designers, it has only been official in Unicode since 2008 and in German orthography in 2017).
Yes (and it shows up fine in both my web browser and my terminals), but that only makes swapcase worse.
s = "ẞ" print(s := s.swapcase()) ß print(s := s.swapcase()) SS print(s := s.swapcase()) ss
Fortunately, you can always rely on casefold to make things consistent:
"ẞ".casefold() == "ß".casefold() == "SS".casefold() == "ss".casefold() True
TBH swapcase is a bit of a minefield if you don't know what language you're working with.
"Iİıi".swapcase() 'ii̇II'
I'm not sure I've ever used it in production. Normally it's just upper(), lower(), or title() for conversions, and casefold() for comparisons. The most logical "negation" of a string would be reversing it, which WOULD be... well, reversible. But that doesn't need an operator, since it already has slice notation. ChrisA
On 2021-10-20 at 11:48:30 +1100, Chris Angelico <rosuav@gmail.com> wrote:
TBH swapcase is a bit of a minefield if you don't know what language you're working with.
[...]
The most logical "negation" of a string would be reversing it, which WOULD be... well, reversible. But that doesn't need an operator, since it already has slice notation.
Slice notation is also a minefield; [some] explosives are combining characters.
On Wed, Oct 20, 2021 at 12:02 PM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
On 2021-10-20 at 11:48:30 +1100, Chris Angelico <rosuav@gmail.com> wrote:
TBH swapcase is a bit of a minefield if you don't know what language you're working with.
[...]
The most logical "negation" of a string would be reversing it, which WOULD be... well, reversible. But that doesn't need an operator, since it already has slice notation.
Slice notation is also a minefield; [some] explosives are combining characters.
True, but since all of Python's indexing, slicing, etc, is defined by codepoints, what that really means is that slicing/reversing a string can cause peculiar behaviours. But that's true of many other types of characters too; if you mix LTR and RTL text, with some directionless text in between, you'll see some very peculiar behaviour when you reverse it (try getting an English word, some ASCII digits, and an Arabic word - the digits are the same ones that both languages use). So I don't think combining characters are unique here. "Reversing" text means many different things depending on context. ChrisA
Steven D'Aprano writes:
Ironically, Ricky's in-fun suggestion that we use the tilde operator for swapcase was the only suggestion in these two threads that actually met the invariant for an inverse that ~~x == x.
You forgot about Turkish, I think it is, that has three cases (the third is called "title case". Just another Joke that Broke because Unicode!
On Wed, Oct 20, 2021 at 11:30:50AM +0900, Stephen J. Turnbull wrote:
Steven D'Aprano writes:
Ironically, Ricky's in-fun suggestion that we use the tilde operator for swapcase was the only suggestion in these two threads that actually met the invariant for an inverse that ~~x == x.
You forgot about Turkish,
Yes I did, Turkish (and a couple of other Turkic languages) have dotted and dottless I, and their case rules differ from the default Unicode rules.
I think it is, that has three cases (the third is called "title case". Just another Joke that Broke because Unicode!
The Unicode titlecase algorithm is used for languages where digraphs like Lj, Nj, or Dz are classified as single letters of the alphabet. For example, the Polish word Dziewczyna uses the Dz digraph as the first letter. If you use the Unicode code point U+01F3 we get: 'dziewczyna'.upper() # returns 'DZIEWCZYNA' 'dziewczyna'.title() # returns 'Dziewczyna' In case the glyphs don't show up for you, they are dz for the lowercase, DZ for the uppercase, and Dz for the titlecase. swapcase() doesn't use titlecase. Titlecased characters remain unchanged when you use swapcase() on them. -- Steve
On Tue, Oct 12, 2021 at 5:21 PM Jeremiah Vivian < nohackingofkrowten@gmail.com> wrote:
So I guess I'll just have to keep this to myself.
I know this is disappointing, but in this case I agree with Jelle -- this particular idea does not fit well in Python's design, it looks like an attempt at saving one opcode (or a few characters to type) for a relatively rare use case. However, I recommend that you don't give up! There are many ways Python can still use improvement, and a few more attempts will help you calibrate your ideas with what might be acceptable. I would also like to remind various other posters that sarcasm is *not* a good way to welcome newbies. The name of the list is python-ideas, not python-ideas-to-shoot-down-sarcastically. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Tue, Oct 12, 2021 at 05:30:13PM -0700, Guido van Rossum wrote:
I would also like to remind various other posters that sarcasm is *not* a good way to welcome newbies. The name of the list is python-ideas, not python-ideas-to-shoot-down-sarcastically.
Guido, it isn't fair of you to jump into this thread and start scolding us for being sarcastic. Jelle's accusation that we "mostly" replied to Jerimiah with sarcasm was inaccurate. The regulars here (especially Chris) spent a lot of time and effort trying to get Jerimiah to understand the need to justify the association between ord(), or some other arbitrary function, with the unary plus operator (or any other arbitrary operator). It is not nice to dismiss that effort as sarcastically shooting the idea down. -- Steve
On Tue, Oct 12, 2021 at 11:50:27PM -0000, Jeremiah Vivian wrote:
I posted a previous thread about overloading the unary `+` operator in strings with `ord`, and that expanded to more than just the unary `+` operator. So I'm saying now, there should be these implementations:
Did you actually read people's comments in that other thread? If you did read that thread, you should have understood that before anyone takes your proposal seriously, you **must** justify why the unary operators should do what you want them to do. So far, you have suggested: +"s" == ord("s") +string = int(string) +string = string.lstrip() with no justification for any of these beyond an *assumption* that calling the function ord() might be slower than using a unary operator. (That might be true, maybe, but I doubt it would be significantly slower.) The point of my earlier email was to make it clear to you how random and arbitrary the choice of ord() for unary plus was, not to convince you to choose a different random and arbitrary choice. What part of `+string` do you think CLEARLY and OBVIOUSLY means "convert the string to an int? Why would `-string` mean "convert to an int using octal" and `~string` mean base 16? Why not the other way? Why map `+string` to lstrip() and `-string` to rstrip() instead of the other way? All these choices seem random and arbitrary. You have still not posted any solid justification for why strings should support unary operators. You haven't even said "because I'm lazy and don't want to type a function name". At least that would be a reason. A bad reason, but still better than no reason at all. So let me be completely frank: - I think you have zero chance of this proposal being accepted. - But if you were to have *any* chance at all, even one in a hundred million, you need to start by giving some good reasons why unary operators should be used for strings at all. - You need to justify the choices. What part of `~string` will make the average Python programmer think of converting to an int in hex, or striping whitespace? -- Steve
Now I didn't expect this thread to blow up in replies with alternatives, specifically `str1 / str2` for 'str1.split(str2)' and `seq1 * str` for 'str.join(seq1)'.
participants (15)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
Abdur-Rahmaan Janhangeer
-
Chris Angelico
-
Guido van Rossum
-
Jelle Zijlstra
-
Jeremiah Vivian
-
Marc-Andre Lemburg
-
MarylandBall Productions
-
MRAB
-
Paul Moore
-
Peter Ludemann
-
Ricky Teachey
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano