stripping unwanted chars from string
Edward Elliott
nobody at 127.0.0.1
Wed May 3 23:36:57 EDT 2006
I'm looking for the "best" way to strip a large set of chars from a filename
string (my definition of best usually means succinct and readable). I
only want to allow alphanumeric chars, dashes, and periods. This is what I
would write in Perl (bless me father, for I have sinned...):
$filename =~ tr/\w.-//cd, or equivalently
$filename =~ s/[^\w.-]//
I could just use re.sub like the second example, but that's a bit overkill.
I'm trying to figure out if there's a good way to do the same thing with
string methods. string.translate seems to do what I want, the problem is
specifying the set of chars to remove. Obviously hardcoding them all is a
non-starter.
Working with chars seems to be a bit of a pain. There's no equivalent of
the range function, one has to do something like this:
>>> [chr(x) for x in range(ord('a'), ord('z')+1)]
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
Do that twice for letters, once for numbers, add in a few others, and I get
the chars I want to keep. Then I'd invert the set and call translate.
It's a mess and not worth the trouble. Unless there's some way to expand a
compact representation of a char list and obtain its complement, it looks
like I'll have to use a regex.
Ideally, there would be a mythical charset module that works like this:
>>> keep = charset.expand (r'\w.-') # or r'a-zA-Z0-9_.-'
>>> toss = charset.invert (keep)
Sadly I can find no such beast. Anyone have any insight? As of now,
regexes look like the best solution.
More information about the Python-list
mailing list