Regular Expression AND mach
michael at foord.net
Mon Mar 22 09:21:13 CET 2004
"Robert Brewer" <fumanchu at amor.org> wrote in message news:<mailman.194.1079807509.742.python-list at python.org>...
> Fuzzyman wrote:
> > Jeff Epler <jepler at unpythonic.net> wrote in message
> > news:<mailman.161.1079716498.742.python-list at python.org>...
> > > Regular expressions are not a good tool for this purpose.
> > Hmm... I'm not sure if I've been helped or not :-)
> > Thanks anyway.....
> > Odd that you can't do this easily with regular expressions - I suppose
> > it doesn't compile down to a neat test.... but then it's hardly a
> > complex search... OTOH I have *no idea* how regular expressions
> > actually work (and no need to find out)...
> >From one Fu.*man to another ;) you do have a need to find out, even if
> you don't recognize it. Start with A.M. Kuchling's excellent,
> Python-based tutorial at: http://www.amk.ca/python/howto/regex/
> At the least, you should understand why a regex is not an all-in-one
> solution to your issue. It basically comes down to the fact that a regex
> is geared to do its analysis in a single pass over your text. As it
> finds partial matches, it may backtrack to try to find a complete match,
> but in general, it moves forward. Therefore, if you want to find three
> words in a *declared* order in your text, a single regex can do it
> easily. If you want to find three words in *any* order, the simplest
> solution using regexes is to perform three separate searches. There are
> ways to get around this within a regex, but they're neither as simple
> nor as maintainable as letting Python do the iteration:
> >>> import re
> >>> text = 'Some aa text cc with bb search terms.'
> >>> search terms = ['aa', 'bb', 'cc']
> >>> [re.findall(re.escape(word), text) for word in search terms]
> [['aa'], ['bb'], ['cc']]
> or, for your case:
> >>> def has all terms(content, terms):
> ... for word in terms:
> ... if not re.search(re.escape(word), content):
> ... return False
> ... return True
> >>> has all terms(text, search terms)
> >>> has all terms('A is for aardvark.', search terms)
Thanks for your reply, it was both helpful and interesting.
Unfortunately it only confirmed my suspicion that what I wanted to do
wasn't possible :-(
The database 'module' I'm using (which is basically fine for my other
purposes) does searches through it's records using regular
expressions. A full search through 1800 records (each record can be a
couple of k of text) takes about 0.2 seconds - which is fine.... but
doing 4 or 5 searches and comparing results *isn't* (0.2 seconds delay
is ok - 0.8-1.0 seconds isn't).... So I'm currently just searching for
the longest word - KirbBase then returns the full *text* of each song
containing that word... and I'm just checking each song to see if it
has the other words :-)
This works fine, isn't noticeably slow, but isn't as elegant as I'd
> Robert Brewer
> Amor Ministries
> fumanchu at amor.org
More information about the Python-list