Re: [Python-ideas] str.rreplace

On 25 Jan 2014 04:29, "Andrew Barnert" <abarnert@yahoo.com> wrote:
and
Strings already provide rfind and rindex (they're just not part of the general sequence API). Since strings are immutable, there's also no call for an "rremove". rreplace (pronounced as 'ar-replace", like "ar-split" et al) is more obvious than a negative count, and seems like an almost exact parallel to rsplit. On the other hand, I don't recall ever lamenting its absence. Call me +0 on the idea. Cheers, Nick.

From: Nick Coghlan <ncoghlan@gmail.com> Sent: Friday, January 24, 2014 4:05 PM
I was responding to Serhiy's (probably facetious or devil's advocate) suggestion that we should regularize the API: add rfind and rindex to tuple (and presumably Sequence), and those plus rremove to list (and presumably MutableSequence), and so on. My point was that if we're going to be that radical, we might as well consider removing methods instead of adding them. Some of the find-like methods already take negative indices; expanding that to all of the index-based methods, and doing the equivalent to the count-based ones, and adding a count or index to those that have neither, would mean all of the "r" variants could go away. I think it's pretty obvious that both this suggestion and Serhiy's are not worth doing for Python—the language has had pretty much the same set of find-style methods for decades, most of them are used frequently, and people rarely go looking for any of the "missing" ones, so why change it? (And I think that was Serhiy's point as well, but I don't want to speak for him.) If people _do_ find themselves missing one particular variant, just adding that one more variant is a lot more conservative than changing everything; if not, there's no reason to add anything at all.

On 01/24/2014 07:36 PM, Andrew Barnert wrote:
How about a keyword to specify which end to index from? When used, it would disable negative indexing as well. When not used the current behaviour with negative indexing would be the default. direction=0 # The default with the current (or not specified) # negative indexing allowed. direction=1 # From first. Negative indexing disallowed. direction=-1 # From last. Negative indexing disallowed. (A shorter key word would be nice, but I can't think of any that is as clear.) The reason for turning off the negative indexing is it would also offer a way to avoid some indexing bugs as well. (Using negative indexing with a reversed index is just asking for trouble I think.) While the spelling isn't a short and concise as I would like, I could always wrap them in short helper functions if I wanted... ffind, rfind, findex, rindex.. etc. But those wouldn't need to be added to python. Cheers, Ron

From: Ron Adam <ron3200@gmail.com> Sent: Monday, January 27, 2014 7:18 PM
(A shorter key word would be nice, but I can't think of any that is as clear.)
Why does it have to be -1/0/1 instead of just True/False? In which case we could use "reverse", the same name that's already used for similar things in other methods like list.sort (and that's implied in the current names "rfind", etc.).
The reason for turning off the negative indexing is it would also offer a way to
avoid some indexing bugs as well. (Using negative indexing with a reversed index is just asking for trouble I think.)
But str.rfind takes negative indices today: >>> 'abccba'.rfind('b', -5, -3) 1 Why take away functionality that already works? And of course str.find takes negative indices and that's actually used in some quick&dirty scripts: >>> has_ext = path.find('.', -4) Of course you could make an argument that any such scripts deserve to be broken…

On 01/27/2014 10:03 PM, Andrew Barnert wrote:
Well, then it would need to be.. True/False/None The reason it needs three modes is to save the current behaviour and not break anything. Actually I'm about even on weather I like the keyword option or separate functions. Also there's the case of taking a slice from the middle with a positive starting index and a negative ending index. And with the exception of examples, nearly all string slicing, use a right and left value to get characters in the forward order even if they are indexed from the right. So that gives four modes... left middle right default With the default being what we have now. I wonder if maybe it would be better to do these things with the string format method? That is a higher level interface more suitable for adding options to.
It could still work that way.. just don't specify a direction. :-)
I'd say they are already broken in that particular case. ;-) -Ron

On Tue, Jan 28, 2014 at 12:27:31AM -0600, Ron Adam wrote:
What's "it", and how is this relevant to adding a version of replace that operates from the right?
Now we're talking about slices? Providing a positive and negative index to a slice is well-defined and well-understood operation. "I want everything except the first and last item" => [1:-1].
With the exception of what examples? The rest of your sentence confuses me. Are you talking about extended slicing with a negative stride given? Please don't over-generalise this issue. It's a simple request to add a version of replaces that operates from the right, just like rfind operates from the right.
So that gives four modes... left middle right default With the default being what we have now.
What?
You're talking about using a mini-language to control the direction of a replacement operation. That's not just an over-generalisation, its a hyper-generalisation.
It's broken, but not because of the negative index. -- Steven

On 01/28/2014 05:03 AM, Andrew Barnert wrote:
(Again, here there is no reversal, but backwards iteration; in list.sort, there is reversal. I'd vote for making all such methods use a logical param, if it did not break code [because eg rfind is used], on the line: l.find(it, backwards=False) or a shorter param name. ) d

On Mon, Jan 27, 2014 at 08:03:54PM -0800, Andrew Barnert wrote:
From: Ron Adam <ron3200@gmail.com>
How about a keyword to specify which end to index from?
-1 As a general rule, when you have a function that takes a parameter which selects between two different sets of behaviour, and you normally specify that parameter as a literal or constant known at edit time, then the function should be split into two. E.g.: # Good API string.upper(), string.lower() # Bad API string.convert_case(to_upper=True|False) sorted() and list.sort() (for example) are a counter-example. Sometimes you know which direction you want at edit-time, but there are many use-cases for leaving the decision to run-time. Nearly every application that sorts data lets the user decide which direction to sort. In the case of replace/rreplace, it is more like the upper vs. lower situation than the sorted situation. For almost any reasonable use-case, you will know at edit-time whether you want to go from the left or from the right, so you'll specify the "direction" parameter as a edit-time literal or constant. The same applies to find/rfind.
When used, it would disable negative indexing as well.
-1 Negative indexing is a standard Python feature. There is nothing wrong with negative indexing, no more than there is something wrong with zero-based positive indexing. It's also irrelevant to the replace/rreplace example, since replace doesn't take start/end indexes, and presumably rreplace wouldn't either.
And if you want to operate from the right, with negative indexing allowed? But really, having a flag to decide whether to allow negative indexing is silly. If you don't want negative indexes, just don't use them.
sorted(alist, reverse=True) gives the same result as sorted(alist, reverse=False) only reversed. That is not the case here: "Hello world".replace("o", "u", 1, reverse=True) # rreplace ought to return "Hello wurld", not "dlrow ulleH".
Exactly. Here, I agree strongly with Andrew. Negative indexing works perfectly well with find/rfind. Slices with negative strides are weird, but negative indexes are well-defined and easy to understand.
It would be an awfully bogus argument. Negative indexes are a well-defined part of Python indexing semantics. One might as well argue that any scripts that rely on list slicing making a copy "deserve to be broken". -- Steven

On Tue, Jan 28, 2014 at 03:07:15PM +0200, Serhiy Storchaka wrote:
Sure. Nothing wrong with them.
# Bad API codecs.encode(data, encoding='hex_codec'|'zlib_codec')
But that's not how the codecs.encode function is usually used. Like my earlier example of sorted(), sometimes you know in advance what encoding you want to use: codecs.encode(text, encoding="uft-8") but for many applications, the encoding parameter is not known until runtime: DEFAULT_ENCODING = "utf-8" encoding = get_encoding() or DEFAULT_ENCODING codecs.encoding(text, encoding=encoding) I can't think of an application where I would want to choose between hex_codec and zlib_codec at runtime, but that's because they are codecs with completely different purposes. A better example might be an application where I choose between compression methods at runtime: def get_compression(): # returns the name of a compression codec # e.g. zlib_codec, bz2_codec, xz_codec, lmza_codec # some of these may not be in the std lib at this time ... codecs.encoding(data, encoding=get_compression()) So the codecs.encoding function does not fail my test of "parameter is nearly always known at edit-time", and it is not a bad API. -- Steven

On 01/28/2014 06:33 AM, Steven D'Aprano wrote:
You are correct, and I got my methods mixed up this morning ... I was thinking of __getitem__ instead of index. And related methods. The issues I was referring to are not directly related as you pointed out. In most cases I do think having separate functions or methods is better. And in this case it's no different than having partition and rrpartition. I think the argument against rreplace and the strangeness of it's name is too late. There are already a fair number of "r" methods. Cheers, Ron

Isn't the only reason that you don't like keyword arguments that take constant values a matter of efficiency? It would be much better to make the compiler smarter than to make the language more verbose. The advantage of keyword arguments is that you never end up with a = x.listrip(...) if blah else x.rstrip(...) On Tuesday, January 28, 2014 7:33:50 AM UTC-5, Steven D'Aprano wrote:

On Feb 3, 2014, at 14:11, Neil Girdhar <mistersheik@gmail.com> wrote:
Isn't the only reason that you don't like keyword arguments that take constant values a matter of efficiency?
I'm pretty sure it's about clarity--letting you write what you mean, and making that meaning explicit at a glance--than performance. (On the rare occasions where performance makes a difference, surely you're already copying the method to a local variable anyway, right?) Keyword arguments imply a dynamic choice, different functions a static one, in basically the same way as, say, keyed values vs. attributes. So, if it would be very rare to choose between foo and rfoo dynamically, it makes sense for them to be separate methods; if it's relatively common, it makes sense to have a single method. (I'm not sure that it _would_ be rare, but if so, I agree with the rest of the argument against my point.)

Neil Girdhar writes:
No, they simplify the *default* interface and make *that* easier to learn. Who could be against that? The issue is "what about the case where there's no TOOWTDI default?" Where TOOWTDI is somewhere around "more than 80% of the time for more than 80% of the users when they want a specific value for the parameter every time on this code path" (vs. choosing that value at runtime).

On 28 January 2014 01:18, Ron Adam <ron3200@gmail.com> wrote:
I've just picked the whole thread at once - and I am a little surprised no one suggested what looks obvious to me (or maybe someone did, I went over the e-mails rather quickly): Why not simply to allow negative indexes to the count argument? It is pretty much unambiguous, straightforward (hmm..actually, the opposite of that) and Pythonistas are used to think of negative indices as counting from the right. Moreover, the convention could be used for index, remove and even overloaded for split, and other methods as well. js -><-

25.01.14 02:05, Nick Coghlan написав(ла):
I'm between -0 and +0. On one hand there are precedents, meaning of these methods looks clear and consistent with others, and the cost of adding these methods are pretty low. On other hand, the cost is larger than zero, and these methods are needed very rarely (and there are other ways to do it). In case of doubts I think the status quo wins.

From: Nick Coghlan <ncoghlan@gmail.com> Sent: Friday, January 24, 2014 4:05 PM
I was responding to Serhiy's (probably facetious or devil's advocate) suggestion that we should regularize the API: add rfind and rindex to tuple (and presumably Sequence), and those plus rremove to list (and presumably MutableSequence), and so on. My point was that if we're going to be that radical, we might as well consider removing methods instead of adding them. Some of the find-like methods already take negative indices; expanding that to all of the index-based methods, and doing the equivalent to the count-based ones, and adding a count or index to those that have neither, would mean all of the "r" variants could go away. I think it's pretty obvious that both this suggestion and Serhiy's are not worth doing for Python—the language has had pretty much the same set of find-style methods for decades, most of them are used frequently, and people rarely go looking for any of the "missing" ones, so why change it? (And I think that was Serhiy's point as well, but I don't want to speak for him.) If people _do_ find themselves missing one particular variant, just adding that one more variant is a lot more conservative than changing everything; if not, there's no reason to add anything at all.

On 01/24/2014 07:36 PM, Andrew Barnert wrote:
How about a keyword to specify which end to index from? When used, it would disable negative indexing as well. When not used the current behaviour with negative indexing would be the default. direction=0 # The default with the current (or not specified) # negative indexing allowed. direction=1 # From first. Negative indexing disallowed. direction=-1 # From last. Negative indexing disallowed. (A shorter key word would be nice, but I can't think of any that is as clear.) The reason for turning off the negative indexing is it would also offer a way to avoid some indexing bugs as well. (Using negative indexing with a reversed index is just asking for trouble I think.) While the spelling isn't a short and concise as I would like, I could always wrap them in short helper functions if I wanted... ffind, rfind, findex, rindex.. etc. But those wouldn't need to be added to python. Cheers, Ron

From: Ron Adam <ron3200@gmail.com> Sent: Monday, January 27, 2014 7:18 PM
(A shorter key word would be nice, but I can't think of any that is as clear.)
Why does it have to be -1/0/1 instead of just True/False? In which case we could use "reverse", the same name that's already used for similar things in other methods like list.sort (and that's implied in the current names "rfind", etc.).
The reason for turning off the negative indexing is it would also offer a way to
avoid some indexing bugs as well. (Using negative indexing with a reversed index is just asking for trouble I think.)
But str.rfind takes negative indices today: >>> 'abccba'.rfind('b', -5, -3) 1 Why take away functionality that already works? And of course str.find takes negative indices and that's actually used in some quick&dirty scripts: >>> has_ext = path.find('.', -4) Of course you could make an argument that any such scripts deserve to be broken…

On 01/27/2014 10:03 PM, Andrew Barnert wrote:
Well, then it would need to be.. True/False/None The reason it needs three modes is to save the current behaviour and not break anything. Actually I'm about even on weather I like the keyword option or separate functions. Also there's the case of taking a slice from the middle with a positive starting index and a negative ending index. And with the exception of examples, nearly all string slicing, use a right and left value to get characters in the forward order even if they are indexed from the right. So that gives four modes... left middle right default With the default being what we have now. I wonder if maybe it would be better to do these things with the string format method? That is a higher level interface more suitable for adding options to.
It could still work that way.. just don't specify a direction. :-)
I'd say they are already broken in that particular case. ;-) -Ron

On Tue, Jan 28, 2014 at 12:27:31AM -0600, Ron Adam wrote:
What's "it", and how is this relevant to adding a version of replace that operates from the right?
Now we're talking about slices? Providing a positive and negative index to a slice is well-defined and well-understood operation. "I want everything except the first and last item" => [1:-1].
With the exception of what examples? The rest of your sentence confuses me. Are you talking about extended slicing with a negative stride given? Please don't over-generalise this issue. It's a simple request to add a version of replaces that operates from the right, just like rfind operates from the right.
So that gives four modes... left middle right default With the default being what we have now.
What?
You're talking about using a mini-language to control the direction of a replacement operation. That's not just an over-generalisation, its a hyper-generalisation.
It's broken, but not because of the negative index. -- Steven

On 01/28/2014 05:03 AM, Andrew Barnert wrote:
(Again, here there is no reversal, but backwards iteration; in list.sort, there is reversal. I'd vote for making all such methods use a logical param, if it did not break code [because eg rfind is used], on the line: l.find(it, backwards=False) or a shorter param name. ) d

On Mon, Jan 27, 2014 at 08:03:54PM -0800, Andrew Barnert wrote:
From: Ron Adam <ron3200@gmail.com>
How about a keyword to specify which end to index from?
-1 As a general rule, when you have a function that takes a parameter which selects between two different sets of behaviour, and you normally specify that parameter as a literal or constant known at edit time, then the function should be split into two. E.g.: # Good API string.upper(), string.lower() # Bad API string.convert_case(to_upper=True|False) sorted() and list.sort() (for example) are a counter-example. Sometimes you know which direction you want at edit-time, but there are many use-cases for leaving the decision to run-time. Nearly every application that sorts data lets the user decide which direction to sort. In the case of replace/rreplace, it is more like the upper vs. lower situation than the sorted situation. For almost any reasonable use-case, you will know at edit-time whether you want to go from the left or from the right, so you'll specify the "direction" parameter as a edit-time literal or constant. The same applies to find/rfind.
When used, it would disable negative indexing as well.
-1 Negative indexing is a standard Python feature. There is nothing wrong with negative indexing, no more than there is something wrong with zero-based positive indexing. It's also irrelevant to the replace/rreplace example, since replace doesn't take start/end indexes, and presumably rreplace wouldn't either.
And if you want to operate from the right, with negative indexing allowed? But really, having a flag to decide whether to allow negative indexing is silly. If you don't want negative indexes, just don't use them.
sorted(alist, reverse=True) gives the same result as sorted(alist, reverse=False) only reversed. That is not the case here: "Hello world".replace("o", "u", 1, reverse=True) # rreplace ought to return "Hello wurld", not "dlrow ulleH".
Exactly. Here, I agree strongly with Andrew. Negative indexing works perfectly well with find/rfind. Slices with negative strides are weird, but negative indexes are well-defined and easy to understand.
It would be an awfully bogus argument. Negative indexes are a well-defined part of Python indexing semantics. One might as well argue that any scripts that rely on list slicing making a copy "deserve to be broken". -- Steven

On Tue, Jan 28, 2014 at 03:07:15PM +0200, Serhiy Storchaka wrote:
Sure. Nothing wrong with them.
# Bad API codecs.encode(data, encoding='hex_codec'|'zlib_codec')
But that's not how the codecs.encode function is usually used. Like my earlier example of sorted(), sometimes you know in advance what encoding you want to use: codecs.encode(text, encoding="uft-8") but for many applications, the encoding parameter is not known until runtime: DEFAULT_ENCODING = "utf-8" encoding = get_encoding() or DEFAULT_ENCODING codecs.encoding(text, encoding=encoding) I can't think of an application where I would want to choose between hex_codec and zlib_codec at runtime, but that's because they are codecs with completely different purposes. A better example might be an application where I choose between compression methods at runtime: def get_compression(): # returns the name of a compression codec # e.g. zlib_codec, bz2_codec, xz_codec, lmza_codec # some of these may not be in the std lib at this time ... codecs.encoding(data, encoding=get_compression()) So the codecs.encoding function does not fail my test of "parameter is nearly always known at edit-time", and it is not a bad API. -- Steven

On 01/28/2014 06:33 AM, Steven D'Aprano wrote:
You are correct, and I got my methods mixed up this morning ... I was thinking of __getitem__ instead of index. And related methods. The issues I was referring to are not directly related as you pointed out. In most cases I do think having separate functions or methods is better. And in this case it's no different than having partition and rrpartition. I think the argument against rreplace and the strangeness of it's name is too late. There are already a fair number of "r" methods. Cheers, Ron

Isn't the only reason that you don't like keyword arguments that take constant values a matter of efficiency? It would be much better to make the compiler smarter than to make the language more verbose. The advantage of keyword arguments is that you never end up with a = x.listrip(...) if blah else x.rstrip(...) On Tuesday, January 28, 2014 7:33:50 AM UTC-5, Steven D'Aprano wrote:

On Feb 3, 2014, at 14:11, Neil Girdhar <mistersheik@gmail.com> wrote:
Isn't the only reason that you don't like keyword arguments that take constant values a matter of efficiency?
I'm pretty sure it's about clarity--letting you write what you mean, and making that meaning explicit at a glance--than performance. (On the rare occasions where performance makes a difference, surely you're already copying the method to a local variable anyway, right?) Keyword arguments imply a dynamic choice, different functions a static one, in basically the same way as, say, keyed values vs. attributes. So, if it would be very rare to choose between foo and rfoo dynamically, it makes sense for them to be separate methods; if it's relatively common, it makes sense to have a single method. (I'm not sure that it _would_ be rare, but if so, I agree with the rest of the argument against my point.)

Neil Girdhar writes:
No, they simplify the *default* interface and make *that* easier to learn. Who could be against that? The issue is "what about the case where there's no TOOWTDI default?" Where TOOWTDI is somewhere around "more than 80% of the time for more than 80% of the users when they want a specific value for the parameter every time on this code path" (vs. choosing that value at runtime).

On 28 January 2014 01:18, Ron Adam <ron3200@gmail.com> wrote:
I've just picked the whole thread at once - and I am a little surprised no one suggested what looks obvious to me (or maybe someone did, I went over the e-mails rather quickly): Why not simply to allow negative indexes to the count argument? It is pretty much unambiguous, straightforward (hmm..actually, the opposite of that) and Pythonistas are used to think of negative indices as counting from the right. Moreover, the convention could be used for index, remove and even overloaded for split, and other methods as well. js -><-

25.01.14 02:05, Nick Coghlan написав(ла):
I'm between -0 and +0. On one hand there are precedents, meaning of these methods looks clear and consistent with others, and the cost of adding these methods are pretty low. On other hand, the cost is larger than zero, and these methods are needed very rarely (and there are other ways to do it). In case of doubts I think the status quo wins.
participants (10)
-
Andrew Barnert
-
Joao S. O. Bueno
-
MRAB
-
Neil Girdhar
-
Nick Coghlan
-
Ron Adam
-
Serhiy Storchaka
-
spir
-
Stephen J. Turnbull
-
Steven D'Aprano