function to remove and punctuation
Peter Otten
__peter__ at web.de
Sun Apr 10 08:35:55 EDT 2016
geshdus at gmail.com wrote:
> how to write a function taking a string parameter, which returns it after
> you delete the spaces, punctuation marks, accented characters in python ?
Looks like you want to remove more characters than you want to keep. In this
case I'd decide what characters too keep first, e. g. (assuming Python 3)
>>> import string
>>> keep = string.ascii_letters + string.digits
>>> keep
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
Now you can iterate over the characters and check if you want to preserve it
for each of them:
>>> def clean(s, keep):
... return "".join(c for c in s if c in keep)
...
>>> clean("<alpha> äöü ::42", keep)
'alpha42'
>>> clean("<alpha> äöü ::42", string.ascii_letters)
'alpha'
If you are dealing with a lot of text you can make this a bit more efficient
with the str.translate() method. Create a mapping that maps all characters
that you want to keep to themselves
>>> m = str.maketrans(keep, keep)
>>> m[ord("a")]
97
>>> m[ord(">")]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 62
and all characters that you want to discard to None
>>> from collections import defaultdict
>>> trans = defaultdict(lambda: None, m)
>>> trans[ord("s")]
115
>>> trans[ord("ß")] # returns None, so nothing is printed
>>>
Now pass it to the translate() method:
>>> "<alpha> äöü ::42".translate(trans)
'alpha42'
You changed your mind and want to translate " " to "_"? Here's how:
>>> trans[ord(" ")] = "_"
>>> "<alpha> äöü ::42".translate(trans)
'alpha__42'
>>> trans[ord(" ")] = "_"
>>> "<alpha> äöü ::42".translate(trans)
More information about the Python-list
mailing list