[Tutor] Removing control characters
Dinesh B Vadhia
dineshbvadhia at hotmail.com
Thu Feb 19 20:25:45 CET 2009
At the bottom of the link http://code.activestate.com/recipes/303342/ there are list comprehensions for string manipulation ie.
import string
str = 'Chris Perkins : 224-7992'
set = '0123456789'
r = '$'
# 1) Keeping only a given set of characters.
print ''.join([c for c in str if c in set])
> '2247992'
# 2) Deleting a given set of characters.
print ''.join([c for c in str if c not in set])
> 'Chris Perkins : -'
The missing one is
# 3) Replacing a set of characters with a single character ie.
for c in str:
if c in set:
string.replace (c, r)
to give
> 'Chris Perkins : $$$-$$$$'
My solution is:
print ''.join[string.replace(c, r) for c in str if c in set]
But, this returns a syntax error. Any idea why?
Ta!
Dinesh
From: Kent Johnson
Sent: Thursday, February 19, 2009 8:03 AM
To: Dinesh B Vadhia
Cc: tutor at python.org
Subject: Re: [Tutor] Removing control characters
On Thu, Feb 19, 2009 at 10:14 AM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com> wrote:
> I want a regex to remove control characters (< chr(32) and > chr(126)) from
> strings ie.
>
> line = re.sub(r"[^a-z0-9-';.]", " ", line) # replace all chars NOT A-Z,
> a-z, 0-9, [-';.] with " "
>
> 1. What is the best way to include all the required chars rather than list
> them all within the r"" ?
You have to list either the chars you want, as you have done, or the
ones you don't want. You could use
r'[\x00-\x1f\x7f-\xff]' or
r'[^\x20-\x7e]'
> 2. How do you handle the inclusion of the quotation mark " ?
Use \", that works even in a raw string.
By the way string.translate() is likely to be faster for this purpose
than re.sub(). This recipe might help:
http://code.activestate.com/recipes/303342/
Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090219/f8b2214a/attachment.htm>
More information about the Tutor
mailing list