Regular Expression AND mach

Fuzzyman michael at foord.net
Mon Mar 22 09:21:13 CET 2004


"Robert Brewer" <fumanchu at amor.org> wrote in message news:<mailman.194.1079807509.742.python-list at python.org>...
> Fuzzyman wrote:
> > Jeff Epler <jepler at unpythonic.net> wrote in message 
> > news:<mailman.161.1079716498.742.python-list at python.org>...
> > > Regular expressions are not a good tool for this purpose.
> > 
> > Hmm... I'm not sure if I've been helped or not :-)
> > Thanks anyway.....
> > 
> > Odd that you can't do this easily with regular expressions - I suppose
> > it doesn't compile down to a neat test.... but then it's hardly a
> > complex search... OTOH I have *no idea* how regular expressions
> > actually work (and no need to find out)...
>  
> >From one Fu.*man to another ;) you do have a need to find out, even if
> you don't recognize it. Start with A.M. Kuchling's excellent,
> Python-based tutorial at: http://www.amk.ca/python/howto/regex/
> 
> At the least, you should understand why a regex is not an all-in-one
> solution to your issue. It basically comes down to the fact that a regex
> is geared to do its analysis in a single pass over your text. As it
> finds partial matches, it may backtrack to try to find a complete match,
> but in general, it moves forward. Therefore, if you want to find three
> words in a *declared* order in your text, a single regex can do it
> easily. If you want to find three words in *any* order, the simplest
> solution using regexes is to perform three separate searches. There are
> ways to get around this within a regex, but they're neither as simple
> nor as maintainable as letting Python do the iteration:
> 
> >>> import re
> >>> text = 'Some aa text cc with bb search terms.'
> >>> search terms = ['aa', 'bb', 'cc']
> >>> [re.findall(re.escape(word), text) for word in search terms]
> [['aa'], ['bb'], ['cc']]
> 
> or, for your case:
> 
> >>> def has all terms(content, terms):
> ... 	for word in terms:
> ... 		if not re.search(re.escape(word), content):
> ... 			return False
> ... 	return True
> ... 
> >>> has all terms(text, search terms)
>  True
> >>> has all terms('A is for aardvark.', search terms)
> False
> 

Thanks for your reply, it was both helpful and interesting.
Unfortunately it only confirmed my suspicion that what I wanted to do
wasn't possible :-(

The database 'module' I'm using (which is basically fine for my other
purposes) does searches through it's records using regular
expressions. A full search through 1800 records (each record can be a
couple of k of text) takes about 0.2 seconds - which is fine.... but
doing 4 or 5 searches and comparing results *isn't* (0.2 seconds delay
is ok - 0.8-1.0 seconds isn't).... So I'm currently just searching for
the longest word - KirbBase then returns the full *text* of each song
containing that word... and I'm just checking each song to see if it
has the other words :-)

This works fine, isn't noticeably slow, but isn't as elegant as I'd
hoped...

Regards,


Fuzzy

http://www.voidspace.org.uk/atlantibots/pythonutils.html

> 
> HTCYTIRABM!
> 
> Robert Brewer
> MIS
> Amor Ministries
> fumanchu at amor.org



More information about the Python-list mailing list