string.replace should accept a list as a first argument

I think string.replace should be changed accept a list as a first argument. That way, if I had this string: "There are a lot of undesirable people in this filthy world" Then I could do this, replace(['undesirable', 'filthy'], ''), in case that's what I wanted to do. Now, string.replace doesn't accept a list as its first argument, and complains about implicit conversion. Is there any great obstacle to just having the function loop over that list, calling itself in case we get a list argument instead of a str? Doesn't that seem like the more obvious behaviour? To me the results of running the above code should be unsurprising, if this change was implemented: "there are a lot of people in this world". / Emil Petersen

On 06.10.2015 21:25, Emil Rosendahl Petersen wrote:
I think the "one obvious way" of doing a multi-replace is to use the re module, since implementing this efficiently is non-trivial. String methods are meant to be basic (high performance) operations. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Tue, Oct 06, 2015 at 06:34:16PM +0200, M.-A. Lemburg wrote:
[Emil]
Is there any great obstacle to just having the function loop over that list, calling itself in case we get a list argument instead of a str?
Looping over each replacement item is the wrong solution. Think of the result when one of the search strings is a substring of the replacement: py> source = "I ate a chicken salad, and she had a ham sandwich." py> for term in ["ham", "turkey", "chicken", "spam"]: ... source = source.replace(term, "spam and cheese") ... py> print(source) I ate a spam and cheese and cheese salad, and she had a spam and cheese and cheese sandwich. You need to be a bit more careful about how to do the replacements.
[MAL]
A similar issue was discussed last month, in the context of str.split rather than replace, and I talked about the pitfalls of using the re module: https://mail.python.org/pipermail/python-ideas/2015-September/036586.html The implementation isn't hard, but it's just tricky enough that some people will get it wrong, and just useful enough that a helper function will be a good idea. The question is, should that helper function be a string method, in the standard library, or merely something that you add to your own projects? def replace_all(source, old, new, count=None): if isinstance(old, str): return source.replace(old, new, count) elif isinstance(old, tuple) regex = '|'.join(re.escape(s) for s in old) return new.join(re.split(regex, source, count)) -- Steve

Would you happen to have anything to do with the creation of the regex module? Just curious. ;) On October 6, 2015 9:24:40 PM CDT, MRAB <python@mrabarnett.plus.com> wrote:
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. CURRENTLY LISTENING TO: Prison of Beauty (Kirby Triple Deluxe) by Jun Ishikawa, Hirokazu Ando

import re result = re.sub('undesirable|filthy', '', 'There are a lot of undesirable people in this filthy world') print(result) # There are a lot of people in this world On Tue, Oct 6, 2015 at 2:25 PM, Emil Rosendahl Petersen < emilrosendahlpetersen@outlook.com> wrote:
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/

On 06.10.2015 18:37, Ryan Gonzalez wrote:
Yes, but(TM): The OP has a list, not a string. Yes, he could create one by '|'.join(['undesirable', 'filthy']) but that's like: "Hey, I have some structured data, let's create some applesauce and have a secondary parser re-create structured data from it." IMHO that feels wrong. Best, Sven

"Sven R. Kunze" <srkunze@mail.de> writes:
Not sure how to get around that, other than by creating a general "structured regex" module, to build a compiled regex from an abstract regex syntax tree rather than a string. Which actually might not be the worst thing in the world. I think there are some Lisp dialects that have something like that.

On 2015-10-07 01:59, Chris Angelico wrote:
There's always this: https://pypi.python.org/pypi/regex Look at "Named lists". ;-)

Chris Angelico <rosuav@gmail.com> writes:
Not really. My idea was that no string in regex syntax ever exists, but instead something like "(?a|b|c)d*" becomes more like this: Sequence(Alternate("a","b","c"), Star("d")), which the regex engine could then use instead of parsing a string to build a state machine. All strings would be literal strings - "a\.b" would become "a.b" and "a.b" would become Sequence("a", Dot(), "b") Then you could just use Alternate(*lst).

On October 6, 2015 9:14:54 PM CDT, Random832 <random832@fastmail.com> wrote:
Isn't that kind of like PyParsing with a DFA? ...and that's a really cool idea. Completely coincidentally, I've been working on a JIT-ted regex library in C. There's no parser yet, and I was planning on de-exposing the internal structures, but, after reading this, I'll probably just leave it exposed.
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.

On Tue, Oct 6, 2015, at 22:19, Ryan Gonzalez wrote:
Be careful about validating the resulting structure, though. Circular references could be dangerous. I did manage to find the lisp stuff I mentioned: http://scsh.net/docu/html/man-Z-H-7.html http://www.ccs.neu.edu/home/shivers/papers/sre.txt http://srfi.schemers.org/srfi-115/srfi-115.html https://common-lisp.net/~loliveira/ediware/cl-ppcre/doc/ I think the scsh one was the one I'd actually seen before posting the idea.

On 2015-10-06 20:25, Emil Rosendahl Petersen wrote:
Looping over the list is the wrong way to do it because each pass might produce a string that leads to different matches for the subsequent passes. I think that the replacements should be done based on the earliest longest match.

On 06.10.2015 21:25, Emil Rosendahl Petersen wrote:
I think string.replace should be changed accept a list as a first argument.
I think I can understand the sentiment here. However, I would prefer a str.replace_all method that would accept it. I cannot think of any place, where I would want to use either a str or a list of str. So, this could hide bugs. Speaking of "replace", sometimes, I would love to pass an "replace all of these with all of those" dict, which is then processed internally. I can remember two times where we needed to write some kind of for-loop; which actually might produce wrong results: convert_dict = { '1': '2', '2': '3', } original = '12345' for from_, to in convert_dict.items(): original = original.replace(from_, to) Real-world examples don't really have this issue so we accepted this kind of workaround for smaller scripts. But I would prefer a stdlib solution for this. Best, Sven

On Tue, Oct 6, 2015 at 1:35 PM, Sven R. Kunze <srkunze@mail.de> wrote:
Also, you can do: re.sub('|'.join(convert_dict.keys()), lambda x: convert_dict[x.group(0)], original) Which is probably better as it performs the match at once. But of course you have to know ahead of time what the keys of the dict may be as they need to be escaped etc. Matt

On Oct 6, 2015, at 12:25, Emil Rosendahl Petersen <emilrosendahlpetersen@outlook.com> wrote:
I think string.replace should be changed accept a list as a first argument.
Just a list? If I call it with a tuple, or some other kind of iterable, should I get a TypeError telling me to pass a str or list? But of course str is also an iterable of str, which makes it ambiguous if you do otherwise. Python does have a few functions they can take either one Foo or multiple Food (and Foo may itself be iterable), but everywhere else it uses tuple as the special type, not list.
Even without the multiple arguments, this is a very strange use of replace. For one thing, it leaves extra spaces behind. Also, I suspect your be tempted to use it similarly if you wanted to remove "filth", and the complain that it turned "filthy" into "y". I suspect that, in your actual motivating code, you want to either split the string, filter it, and rejoin it, or use regular expressions or a more complicated parser.
There are various inconsistent things that replace could do with multiple arguments, but I think this is the one you'd be least likely to want when they differ, and that would surprise people the most. Consider cases like removing "fily" and "th": should removing the "th" from "filthy" make it eligible to have the remaining "fily" replaced? Does it matter which order they're passed in the list? What if the replacement string isn't "" but "c"; does that mean "filcy" can be replaced? Of course as long as you think of a string as a sequence of words rather than a sequence of characters, you don't think of these issues—which is exactly why I think you should probably be splitting the string into words, so you don't have to. Or, if you really do want to do this character by character, you need to think through what you mean by how it affects order, greediness, etc. Often, the simplest way to express that is a regular expression, in which case, just do that. If you explicitly want something that's hard to express in a regexp, it's probably uncommon enough that you don't want it as a method on str, and want to have the logic in clear Python code for later reading.

On 06.10.2015 21:25, Emil Rosendahl Petersen wrote:
I think the "one obvious way" of doing a multi-replace is to use the re module, since implementing this efficiently is non-trivial. String methods are meant to be basic (high performance) operations. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Tue, Oct 06, 2015 at 06:34:16PM +0200, M.-A. Lemburg wrote:
[Emil]
Is there any great obstacle to just having the function loop over that list, calling itself in case we get a list argument instead of a str?
Looping over each replacement item is the wrong solution. Think of the result when one of the search strings is a substring of the replacement: py> source = "I ate a chicken salad, and she had a ham sandwich." py> for term in ["ham", "turkey", "chicken", "spam"]: ... source = source.replace(term, "spam and cheese") ... py> print(source) I ate a spam and cheese and cheese salad, and she had a spam and cheese and cheese sandwich. You need to be a bit more careful about how to do the replacements.
[MAL]
A similar issue was discussed last month, in the context of str.split rather than replace, and I talked about the pitfalls of using the re module: https://mail.python.org/pipermail/python-ideas/2015-September/036586.html The implementation isn't hard, but it's just tricky enough that some people will get it wrong, and just useful enough that a helper function will be a good idea. The question is, should that helper function be a string method, in the standard library, or merely something that you add to your own projects? def replace_all(source, old, new, count=None): if isinstance(old, str): return source.replace(old, new, count) elif isinstance(old, tuple) regex = '|'.join(re.escape(s) for s in old) return new.join(re.split(regex, source, count)) -- Steve

Would you happen to have anything to do with the creation of the regex module? Just curious. ;) On October 6, 2015 9:24:40 PM CDT, MRAB <python@mrabarnett.plus.com> wrote:
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. CURRENTLY LISTENING TO: Prison of Beauty (Kirby Triple Deluxe) by Jun Ishikawa, Hirokazu Ando

import re result = re.sub('undesirable|filthy', '', 'There are a lot of undesirable people in this filthy world') print(result) # There are a lot of people in this world On Tue, Oct 6, 2015 at 2:25 PM, Emil Rosendahl Petersen < emilrosendahlpetersen@outlook.com> wrote:
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/

On 06.10.2015 18:37, Ryan Gonzalez wrote:
Yes, but(TM): The OP has a list, not a string. Yes, he could create one by '|'.join(['undesirable', 'filthy']) but that's like: "Hey, I have some structured data, let's create some applesauce and have a secondary parser re-create structured data from it." IMHO that feels wrong. Best, Sven

"Sven R. Kunze" <srkunze@mail.de> writes:
Not sure how to get around that, other than by creating a general "structured regex" module, to build a compiled regex from an abstract regex syntax tree rather than a string. Which actually might not be the worst thing in the world. I think there are some Lisp dialects that have something like that.

On 2015-10-07 01:59, Chris Angelico wrote:
There's always this: https://pypi.python.org/pypi/regex Look at "Named lists". ;-)

Chris Angelico <rosuav@gmail.com> writes:
Not really. My idea was that no string in regex syntax ever exists, but instead something like "(?a|b|c)d*" becomes more like this: Sequence(Alternate("a","b","c"), Star("d")), which the regex engine could then use instead of parsing a string to build a state machine. All strings would be literal strings - "a\.b" would become "a.b" and "a.b" would become Sequence("a", Dot(), "b") Then you could just use Alternate(*lst).

On October 6, 2015 9:14:54 PM CDT, Random832 <random832@fastmail.com> wrote:
Isn't that kind of like PyParsing with a DFA? ...and that's a really cool idea. Completely coincidentally, I've been working on a JIT-ted regex library in C. There's no parser yet, and I was planning on de-exposing the internal structures, but, after reading this, I'll probably just leave it exposed.
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.

On Tue, Oct 6, 2015, at 22:19, Ryan Gonzalez wrote:
Be careful about validating the resulting structure, though. Circular references could be dangerous. I did manage to find the lisp stuff I mentioned: http://scsh.net/docu/html/man-Z-H-7.html http://www.ccs.neu.edu/home/shivers/papers/sre.txt http://srfi.schemers.org/srfi-115/srfi-115.html https://common-lisp.net/~loliveira/ediware/cl-ppcre/doc/ I think the scsh one was the one I'd actually seen before posting the idea.

On 2015-10-06 20:25, Emil Rosendahl Petersen wrote:
Looping over the list is the wrong way to do it because each pass might produce a string that leads to different matches for the subsequent passes. I think that the replacements should be done based on the earliest longest match.

On 06.10.2015 21:25, Emil Rosendahl Petersen wrote:
I think string.replace should be changed accept a list as a first argument.
I think I can understand the sentiment here. However, I would prefer a str.replace_all method that would accept it. I cannot think of any place, where I would want to use either a str or a list of str. So, this could hide bugs. Speaking of "replace", sometimes, I would love to pass an "replace all of these with all of those" dict, which is then processed internally. I can remember two times where we needed to write some kind of for-loop; which actually might produce wrong results: convert_dict = { '1': '2', '2': '3', } original = '12345' for from_, to in convert_dict.items(): original = original.replace(from_, to) Real-world examples don't really have this issue so we accepted this kind of workaround for smaller scripts. But I would prefer a stdlib solution for this. Best, Sven

On Tue, Oct 6, 2015 at 1:35 PM, Sven R. Kunze <srkunze@mail.de> wrote:
Also, you can do: re.sub('|'.join(convert_dict.keys()), lambda x: convert_dict[x.group(0)], original) Which is probably better as it performs the match at once. But of course you have to know ahead of time what the keys of the dict may be as they need to be escaped etc. Matt

On Oct 6, 2015, at 12:25, Emil Rosendahl Petersen <emilrosendahlpetersen@outlook.com> wrote:
I think string.replace should be changed accept a list as a first argument.
Just a list? If I call it with a tuple, or some other kind of iterable, should I get a TypeError telling me to pass a str or list? But of course str is also an iterable of str, which makes it ambiguous if you do otherwise. Python does have a few functions they can take either one Foo or multiple Food (and Foo may itself be iterable), but everywhere else it uses tuple as the special type, not list.
Even without the multiple arguments, this is a very strange use of replace. For one thing, it leaves extra spaces behind. Also, I suspect your be tempted to use it similarly if you wanted to remove "filth", and the complain that it turned "filthy" into "y". I suspect that, in your actual motivating code, you want to either split the string, filter it, and rejoin it, or use regular expressions or a more complicated parser.
There are various inconsistent things that replace could do with multiple arguments, but I think this is the one you'd be least likely to want when they differ, and that would surprise people the most. Consider cases like removing "fily" and "th": should removing the "th" from "filthy" make it eligible to have the remaining "fily" replaced? Does it matter which order they're passed in the list? What if the replacement string isn't "" but "c"; does that mean "filcy" can be replaced? Of course as long as you think of a string as a sequence of words rather than a sequence of characters, you don't think of these issues—which is exactly why I think you should probably be splitting the string into words, so you don't have to. Or, if you really do want to do this character by character, you need to think through what you mean by how it affects order, greediness, etc. Often, the simplest way to express that is a regular expression, in which case, just do that. If you explicitly want something that's hard to express in a regexp, it's probably uncommon enough that you don't want it as a method on str, and want to have the logic in clear Python code for later reading.
participants (10)
-
Andrew Barnert
-
Chris Angelico
-
Emil Rosendahl Petersen
-
M.-A. Lemburg
-
Matthew Einhorn
-
MRAB
-
Random832
-
Ryan Gonzalez
-
Steven D'Aprano
-
Sven R. Kunze