[Tutor] finding special character string

Kent Johnson kent37 at tds.net
Tue Jun 3 11:47:23 CEST 2008


On Tue, Jun 3, 2008 at 5:13 AM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com> wrote:
> Yes, I'm happy because I found a non-regex way to solve the problem (see
> below).

> How did I solve it?  I found a list of all the special words, created a set
> of special words and then checked if each word in the text belonged to the
> set of special words.  If we assume that the list of special words doesn't
> exist then the problem is interesting in itself to solve.

Even with the list of special words I would still use a regex and
process the whole file at once.  If the list is in the variable
'specials' and the file data in 'data', then build and apply a regex
like this:

import re
specialsRe = re.compile('|'.join(r'\.%s\.' % re.escape(s) for s in specials))
data = specialsRe.sub(data, '')

The regex just escapes any special chars in the words, brackets them
with "." and joins them with "|" in between.

But without the list of specials it is still easy with a regex. This
works on your explanatory text:
data = re.sub(r'\.[a-zA-Z-]{2,}\.', '', data)

Kent


More information about the Tutor mailing list