[Tutor] Why doesn't this regex match???

Sheila King sheila@thinkspot.net
Fri, 08 Feb 2002 22:53:08 -0800

On Sat, 09 Feb 2002 01:25:25 -0500, Tim Peters <tim.one@comcast.net>
wrote about RE: [Tutor] Why doesn't this regex match???:

...<excellent summary snipped--now I understand what wasn't working >...

> Here's another approach to your problem:
> 1. Convert all your phrases to (say) lowercase first.
> 2. Say the list is spamphrases, and the subject line is "subject".
>    Then
>        s = subject.lower()
>        isjunk = 0
>        for phrase in spamphrases:
>            if s.find(phrase) >= 0:
>                isjunk = 1
>                break
>    is worth considering.  It won't tie your head in knots, anyway.

We were doing something like this before. Actually, what we were doing
was more like:

        s = subject.lower()
		s = ' ' + s + ' '
        isjunk = 0
        for phrase in spamphrases:
            if s.find(phrase) >= 0:
                isjunk = 1

Then someone else suggested the regular expression (granted, someone who
is rather adept at them, and is really a Perl coder who is just now
learning Python). I'm not sure we gained much by it, but then I had
thought to do this:

take the entire list of spamphrases and form one big regex out of them
and instead of looping through the list, simply do a single regex test
on the subject line. We are looking to run this filter on a community
server that gets lots of emails to lots of different accounts and we
expect a number of people will be using it. We don't want to have the
sysadmin tell us to turn it off later, due to resource usage. So, I
thought that a single regex test would be more efficient than looping
through the list as we had done before.

Comments on this logic not only welcome, but actually desired!!!

Sheila King

"When introducing your puppy to an adult cat,
restrain the puppy, not the cat." -- Gwen Bailey,
_The Perfect Puppy: How to Raise a Well-behaved Dog_