Easily remove characters from a string.

Having researched this as heavily as I am capable with limited experience, I would like to suggest a Python 3 equivalent to string.translate() that doesn't require a table as input. Maybe in the form of str.stripall() or str.replaceall(). My reasoning is that while it is currently possible to easily strip() preceding and trailing characters, and even replace() individual characters from a string, to replace more than one characters from anywhere within the string requires (i believe) at its simplest a command like this : some_string.translate(str.maketrans('','','0123456789')) In Python 2.* however we could say ... some_string.translate(None, '0123456789') My proposal is that if strip() and replace() are important enough to receive modules, then the arguably more common operation (in terms of programming tutorials, if not mainstream development) of just removing all instances of specified numbers, punctuation, or even letters etc from a list of characters should also. I wholeheartedly admit that there are MANY other ways to do this (including RegEx and List Comprehensions), as listed in the StackOverflow answer below. However the same could be said for replace() and strip(). http://stackoverflow.com/questions/22187233/how-to-delete-all-instances-of-a... This is my first suggestion and welcome any and all feedback, even if this is a silly idea I would really like to know why it is. I have not seen discussion of this before, but if there is such a discussion I would welcome being directed to it. Thank you for your time. Simon

I would use list comprehension even if there were some other way to translate as it is straight forward. On 10/22/16, Simon Mark Holland <simonmarkholland@gmail.com> wrote:
-- With the simplicity of true nature, there shall be no desire. Without desire, one's original nature will be at peace. And the world will naturally be in accord with the right Way. Tao Te Ching

Understood, and I agree, I have seen someone make a similar argument for using RegEx. Here are my main points... 1) Speed - Built-in's are faster. 2) Standardisation - It is a common task, that has MANY ways of being completed. 3) Frequent Task - It is to my mind as useful as str.strip() or str.replace() .. perhaps a lesser point ... 4) Batteries Included - In this case Python 3 is more obtuse than Python 2 in a task which often showcases Pythons ease of use. (see 'Programming Foundations with Python's' secret message lesson for an example.) Those on this list are the least likely to want this functionality, because each of us could solve this quickly in many different ways, but that doesn't mean we should. It is the tasks we don't think about that i believe often eat up cycles. Like I said, even is this is a bad idea I would like to fully grok why. Thank you all for your time. On 23 October 2016 at 02:45, David B <dwblas@gmail.com> wrote:
-- Simon Holland BA Hons Medan, Indonesia -------------------- Mobile : +62 81 26055297 Fax : +62 81 6613280 [image: Twitter] <http://twitter.com/SimonMarkHollan> [image: LinkedIn] <http://id.linkedin.com/in/simonmarkholland> [image: YouTube] <http://www.youtube.com/simonmarkholland> [image: Google Talk] <simonmarkholland@gmail.com>

On 22.10.2016 10:34, Simon Mark Holland wrote:
Could you perhaps give a use case for what you have in mind ? I usually go straight to the re module for anything that's non-trivial in terms of string manipulation, or use my mxTextTools for more complex stuff. re.sub() would be the natural choice for replacing multiple chars or removing multiple chars in one go. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 23 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Sat, Oct 22, 2016 at 03:34:23PM +0700, Simon Mark Holland wrote:
stripall() would not be appropriate: "strip" refers to removing from the front and end of the string, not the middle, and str.strip() already implements a "strip all" functionality: py> '+--+*abcd+-*xyz-*+-'.strip('*+-') 'abcd+-*xyz' But instead of a new method, why not fix translate() to be more user- friendly? Currently, it takes two method calls to delete characters using translate: table = str.maketrans('', '', '*+-.!?') newstring = mystring.translate(table) That's appropriate when you have a big translation table which you are intending to use many times, but its a bit clunky for single, one-off uses. Maybe we could change the API of translate to something like this: def translate(self, *args): if len(args) == 1: # Same as the existing behaviour. table = args[0] elif len(args) == 3: table = type(self).maketrans(*args) else: raise TypeError('too many or not enough arguments') ... Then we could write: newstring = mystring.translate('', '', '1234567890') to delete the digits. So we could fix this... but should we? Is this *actually* a problem that needs fixing, or are we just adding unnecessary complexity?
Stripping from the front and back is a very common operation; in my experience, replacing is probably half as common, maybe even less. But deleting is even less common.
I think the reason that deleting characters is common in tutorials is that it is a simple, easy, obvious task that can be programmed by a beginner in just a few lines. I don't think it is actually something that people need to do very often, outside of exercises. -- Steve

Le 22/10/2016 à 10:34, Simon Mark Holland a écrit :
This actually could be implemented directly in str.replace() without breaking the API by accepting: "stuff".replace('a', '') "stuff".replace(('a', 'b', 'c'), '') "stuff".replace(('a', 'b', 'c'), ('?', '*', '')) A pure Python implementation looks like this: https://github.com/Tygs/ww/blob/dev/src/ww/wrappers/strings.py#L229 (this implementation also allow regexes, which is not what you want for the builtin replace(), however, as it would break the performances expectations) I often had the use case of needing to strip many strings so I would +1 for having a nice and easy way to do it.

On Mon, Oct 24, 2016 at 8:21 AM, Michel Desmoulin <desmoulinmichel@gmail.com
wrote:
+1 -- I have found I Need to do this often enough that I've wondered why it's not there. making three calls to replace() isn't too bad, but is klunky and has performance issues. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Oct 25, 2016 at 4:48 AM, Chris Barker <chris.barker@noaa.gov> wrote:
And it may not be semantically identical. In the examples above, three separate replace calls would work, but a syntax like this ought to be capable of an exchange - "aabbccdd".replace(('b', 'd'), ('d', 'b')) == "aaddccbb". ChrisA

On Mon, Oct 24, 2016 at 5:56 PM, Chris Angelico <rosuav@gmail.com> wrote:
What would be the expected behavior of "aabbccdd".replace(('a', 'aa'), ('x', 'y'))? It's not obvious to me whether longer replacement strings ('aa') or earlier replacement strings ('a') should take priority. Or is the proposal to only support this for replacements of single characters? Nathan

On Tue, Oct 25, 2016 at 9:11 AM, Nathan Schneider <neatnate@gmail.com> wrote:
I'm actually not sure, so I would look at prior art. But in any case, this is a question you can't even ask until replace() accepts multiple arguments. Hence I'm +1 on the notion of simultaneous replacements being supported. ChrisA

On Oct 24, 2016, at 3:54 PM, Chris Angelico <rosuav@gmail.com> wrote:
Agreed -- there are a lot of edge cases to work out, and there is not one way to define the API, but if folks think it's a good idea, we can hash those out. If anyone decides to take this on, be prepared for a lot of bike shedding! -CHB

On Mon, Oct 24, 2016 at 05:37:29PM -0700, Chris Barker - NOAA Federal wrote:
Regarding prior art, I think that the PHP ``strtr`` function is a good example: http://php.net/manual/en/function.strtr.php Especially with regards to the ``replace_pairs`` argument: If given two arguments, the second should be an array in the form array('from' => 'to', ...). The return value is a string where all the occurrences of the array keys have been replaced by the corresponding values. The longest keys will be tried first. Once a substring has been replaced, its new value will not be searched again. This is one I have sometimes used when writing a mini template language, where `{{ username }}` had to be replaced. In contrast to other ways, ``strtr`` gives a one-pass garantuee, which means that it was safe against hypothetical attacks where one would add a template-string to one of the values.

I would use list comprehension even if there were some other way to translate as it is straight forward. On 10/22/16, Simon Mark Holland <simonmarkholland@gmail.com> wrote:
-- With the simplicity of true nature, there shall be no desire. Without desire, one's original nature will be at peace. And the world will naturally be in accord with the right Way. Tao Te Ching

Understood, and I agree, I have seen someone make a similar argument for using RegEx. Here are my main points... 1) Speed - Built-in's are faster. 2) Standardisation - It is a common task, that has MANY ways of being completed. 3) Frequent Task - It is to my mind as useful as str.strip() or str.replace() .. perhaps a lesser point ... 4) Batteries Included - In this case Python 3 is more obtuse than Python 2 in a task which often showcases Pythons ease of use. (see 'Programming Foundations with Python's' secret message lesson for an example.) Those on this list are the least likely to want this functionality, because each of us could solve this quickly in many different ways, but that doesn't mean we should. It is the tasks we don't think about that i believe often eat up cycles. Like I said, even is this is a bad idea I would like to fully grok why. Thank you all for your time. On 23 October 2016 at 02:45, David B <dwblas@gmail.com> wrote:
-- Simon Holland BA Hons Medan, Indonesia -------------------- Mobile : +62 81 26055297 Fax : +62 81 6613280 [image: Twitter] <http://twitter.com/SimonMarkHollan> [image: LinkedIn] <http://id.linkedin.com/in/simonmarkholland> [image: YouTube] <http://www.youtube.com/simonmarkholland> [image: Google Talk] <simonmarkholland@gmail.com>

On 22.10.2016 10:34, Simon Mark Holland wrote:
Could you perhaps give a use case for what you have in mind ? I usually go straight to the re module for anything that's non-trivial in terms of string manipulation, or use my mxTextTools for more complex stuff. re.sub() would be the natural choice for replacing multiple chars or removing multiple chars in one go. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 23 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Sat, Oct 22, 2016 at 03:34:23PM +0700, Simon Mark Holland wrote:
stripall() would not be appropriate: "strip" refers to removing from the front and end of the string, not the middle, and str.strip() already implements a "strip all" functionality: py> '+--+*abcd+-*xyz-*+-'.strip('*+-') 'abcd+-*xyz' But instead of a new method, why not fix translate() to be more user- friendly? Currently, it takes two method calls to delete characters using translate: table = str.maketrans('', '', '*+-.!?') newstring = mystring.translate(table) That's appropriate when you have a big translation table which you are intending to use many times, but its a bit clunky for single, one-off uses. Maybe we could change the API of translate to something like this: def translate(self, *args): if len(args) == 1: # Same as the existing behaviour. table = args[0] elif len(args) == 3: table = type(self).maketrans(*args) else: raise TypeError('too many or not enough arguments') ... Then we could write: newstring = mystring.translate('', '', '1234567890') to delete the digits. So we could fix this... but should we? Is this *actually* a problem that needs fixing, or are we just adding unnecessary complexity?
Stripping from the front and back is a very common operation; in my experience, replacing is probably half as common, maybe even less. But deleting is even less common.
I think the reason that deleting characters is common in tutorials is that it is a simple, easy, obvious task that can be programmed by a beginner in just a few lines. I don't think it is actually something that people need to do very often, outside of exercises. -- Steve

Le 22/10/2016 à 10:34, Simon Mark Holland a écrit :
This actually could be implemented directly in str.replace() without breaking the API by accepting: "stuff".replace('a', '') "stuff".replace(('a', 'b', 'c'), '') "stuff".replace(('a', 'b', 'c'), ('?', '*', '')) A pure Python implementation looks like this: https://github.com/Tygs/ww/blob/dev/src/ww/wrappers/strings.py#L229 (this implementation also allow regexes, which is not what you want for the builtin replace(), however, as it would break the performances expectations) I often had the use case of needing to strip many strings so I would +1 for having a nice and easy way to do it.

On Mon, Oct 24, 2016 at 8:21 AM, Michel Desmoulin <desmoulinmichel@gmail.com
wrote:
+1 -- I have found I Need to do this often enough that I've wondered why it's not there. making three calls to replace() isn't too bad, but is klunky and has performance issues. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Oct 25, 2016 at 4:48 AM, Chris Barker <chris.barker@noaa.gov> wrote:
And it may not be semantically identical. In the examples above, three separate replace calls would work, but a syntax like this ought to be capable of an exchange - "aabbccdd".replace(('b', 'd'), ('d', 'b')) == "aaddccbb". ChrisA

On Mon, Oct 24, 2016 at 5:56 PM, Chris Angelico <rosuav@gmail.com> wrote:
What would be the expected behavior of "aabbccdd".replace(('a', 'aa'), ('x', 'y'))? It's not obvious to me whether longer replacement strings ('aa') or earlier replacement strings ('a') should take priority. Or is the proposal to only support this for replacements of single characters? Nathan

On Tue, Oct 25, 2016 at 9:11 AM, Nathan Schneider <neatnate@gmail.com> wrote:
I'm actually not sure, so I would look at prior art. But in any case, this is a question you can't even ask until replace() accepts multiple arguments. Hence I'm +1 on the notion of simultaneous replacements being supported. ChrisA

On Oct 24, 2016, at 3:54 PM, Chris Angelico <rosuav@gmail.com> wrote:
Agreed -- there are a lot of edge cases to work out, and there is not one way to define the API, but if folks think it's a good idea, we can hash those out. If anyone decides to take this on, be prepared for a lot of bike shedding! -CHB

On Mon, Oct 24, 2016 at 05:37:29PM -0700, Chris Barker - NOAA Federal wrote:
Regarding prior art, I think that the PHP ``strtr`` function is a good example: http://php.net/manual/en/function.strtr.php Especially with regards to the ``replace_pairs`` argument: If given two arguments, the second should be an array in the form array('from' => 'to', ...). The return value is a string where all the occurrences of the array keys have been replaced by the corresponding values. The longest keys will be tried first. Once a substring has been replaced, its new value will not be searched again. This is one I have sometimes used when writing a mini template language, where `{{ username }}` had to be replaced. In contrast to other ways, ``strtr`` gives a one-pass garantuee, which means that it was safe against hypothetical attacks where one would add a template-string to one of the values.
participants (10)
-
Chris Angelico
-
Chris Barker
-
Chris Barker - NOAA Federal
-
David B
-
M.-A. Lemburg
-
Michel Desmoulin
-
Nathan Schneider
-
Simon Mark Holland
-
Sjoerd Job Postmus
-
Steven D'Aprano