[Tutor] Why doesn't this regex match???

Sheila King sheila@thinkspot.net
Fri, 08 Feb 2002 22:53:08 -0800


On Sat, 09 Feb 2002 01:25:25 -0500, Tim Peters <tim.one@comcast.net>
wrote about RE: [Tutor] Why doesn't this regex match???:

...<excellent summary snipped--now I understand what wasn't working >...

> Here's another approach to your problem:
> 
> 1. Convert all your phrases to (say) lowercase first.
> 
> 2. Say the list is spamphrases, and the subject line is "subject".
> 
>    Then
> 
>        s = subject.lower()
>        isjunk = 0
>        for phrase in spamphrases:
>            if s.find(phrase) >= 0:
>                isjunk = 1
>                break
> 
>    is worth considering.  It won't tie your head in knots, anyway.

We were doing something like this before. Actually, what we were doing
was more like:

        s = subject.lower()
		s = ' ' + s + ' '
        isjunk = 0
        for phrase in spamphrases:
            if s.find(phrase) >= 0:
                isjunk = 1
                break

Then someone else suggested the regular expression (granted, someone who
is rather adept at them, and is really a Perl coder who is just now
learning Python). I'm not sure we gained much by it, but then I had
thought to do this:

take the entire list of spamphrases and form one big regex out of them
and instead of looping through the list, simply do a single regex test
on the subject line. We are looking to run this filter on a community
server that gets lots of emails to lots of different accounts and we
expect a number of people will be using it. We don't want to have the
sysadmin tell us to turn it off later, due to resource usage. So, I
thought that a single regex test would be more efficient than looping
through the list as we had done before.

Comments on this logic not only welcome, but actually desired!!!

-- 
Sheila King
http://www.thinkspot.net/sheila/

"When introducing your puppy to an adult cat,
restrain the puppy, not the cat." -- Gwen Bailey,
_The Perfect Puppy: How to Raise a Well-behaved Dog_