[Tutor] re.compile concatenating two searh words

Steve Willoughby steve at alchemy.com
Tue Sep 30 00:43:11 CEST 2008


On Mon, Sep 29, 2008 at 03:19:02PM -0700, Srinivas Iyyer wrote:
> Hi Tutors, 
> I have a list with elements as strings. I want to search if any of these element strings has two words of my interest. How can I ask re.compile to look for both words. 
> 
> my words are 'good' and 'bad'.
> 
> import re
> 
> pat = re.compile('good'+'bad') 

Ok, first of all think about how the compiler will
interpret and act upon this expression.  You are
calling a function with a single argument.

That argument is 'good'+'bad', which means to
concatenate those two strings into a single 
string value 'goodbad', and that is the value
passed to re.compile().  The re.compile() function
cannot know that you added the strings, since that
is done before it ever gets to work, so if you 
were thinking that somehow was meaningful to the
regular expression, that was a mistaken impression.

> a = ['Rama is a good boy','Raghu is a good boy','Sita is a good girl','Ravana is a bad boy','Duryodhan is a bad guy','good is an acceptable nature while bad is unwanted nature in a person']

So you want to match any sentence with BOTH 'good' AND 'bad' in
them?  you could say:

re.compile('good.*bad')

but that would work only if good came first.  It would not match
a string like 'The bad thing about good people is...'

You could do re.compile('good.*bad|bad.*good') I suppose, or several
other possibilities.  I'm not sure exactly what to suggest since I
am guessing you're trying to give a simplified example and aren't
literally looking for this pattern in your program.

> pat.findall(a)

findall is really for finding all the occurrences of a pattern
in a string, not all the matching strings in a list.

you can do this in a filter() to get what I think you're trying
to accomplish.

-- 
Steve Willoughby    |  Using billion-dollar satellites
steve at alchemy.com   |  to hunt for Tupperware.


More information about the Tutor mailing list