[Tutor] Advanced String Search using operators AND, OR etc..
Lie Ryan
lie.1296 at gmail.com
Tue May 5 14:11:22 CEST 2009
Alex Feddor wrote:
> Hi
>
> I am looking for method enables advanced text string search. Method
> string.find() or re module seems no supporting what I am looking for.
> The idea is as follows:
>
> Text ="FDA meeting was successful. New drug is approved for whole sale
> distribution!"
>
> I would like to scan the text using AND and OR operators and gets -1 or
> other value if the searching elements haven't found in the text.
>
> Example 01:
> search criteria: "FDA" AND ( "approve*" OR "supported")
> The catch is that in Text variable FDA and approve words are not one
> after another (other words are in between).
Bring on your hardest searches...
class Pattern(object): pass
class Logical(Pattern):
def __init__(self, pat1, pat2):
self.pat1 = pat1
self.pat2 = pat2
def __call__(self, text):
a, b = self.pat1(text), self.pat2(text)
if self.op(a != len(text), b != len(text)):
return min((a, b))
return len(text)
def __str__(self):
return '(%s %s %s)' % (self.pat1, self.op_name, self.pat2)
class P(Pattern):
def __init__(self, pat):
self.pat = pat
def __call__(self, text):
ret = text.find(self.pat)
return ret if ret != -1 else len(text)
def __str__(self):
return '"%s"' % self.pat
class NOT(Pattern):
def __init__(self, pat):
self.op_name = 'NOT'
self.pat = pat
def __call__(self, text):
ret = self.pat(text)
return ret - 1 if ret == len(text) else len(text)
def __str__(self):
return '%s (%s)' % (self.op_name, self.pat)
class XOR(Logical):
def __init__(self, pat1, pat2):
self.op_name = 'XOR'
self.op = lambda a, b: not(a and b) and (a or b)
super().__init__(pat1, pat2)
class OR(Logical):
def __init__(self, pat1, pat2):
self.op_name = 'OR'
self.op = lambda a, b: a or b
super().__init__(pat1, pat2)
class AND(Logical):
def __init__(self, pat1, pat2):
self.op_name = 'AND'
self.op = lambda a, b: a and b
super().__init__(pat1, pat2)
class Suite(object):
def __init__(self, pat):
self.pat = pat
def __call__(self, text):
ret = self.pat(text)
return ret if ret != len(text) else -1
def __str__(self):
return '[%s]' % self.pat
pat1 = P('FDA')
pat2 = P('approve*')
pat3 = P('supported')
p = Suite(AND(pat1, OR(pat2, pat3)))
print(p(''))
print(p('FDA'))
print(p('FDA supported'))
print(p('supported FDA'))
print(p('blah FDA bloh supported blih'))
print(p('blah FDA bleh supported bloh supported blih '))
p = Suite(AND(OR(pat1, pat2), XOR(pat2, NOT(pat3))))
print(p)
print(p(''))
print(p('FDA'))
print(p('FDA supported'))
print(p('supported sdc FDA sd'))
print(p('blah blih FDA bluh'))
print(p('blah blif supported blog'))
#################
I guess I went a bit overboard here (had too much time on hand), the
working is based on function composition, so instead of evaluation, you
composes a function (or more accurately, a callable class) that will
evaluate the logical value and return the index of the first item that
matches the logical expression. It currently uses str's builtin find,
but I guess it wouldn't be very hard to adapt it to use the re myfind()
below (only P class will need to change)
The Suite class is only there to turn the NotFound sentinel from
len(text) to -1 (used len(text) since it simplifies the code a lot...)
Caveat: The NOT class cannot reliably convert a False to True because I
don't know what index number to use.
Code written for efficient vertical space, not the most readable in the
world.
No guarantee no bug.
Idea:
Overrides the operator on Pattern class so we could write it like:
P("Hello") & P("World") instead of AND(P("Hello"), P("World"))
> Example 02:
> search criteria: "Ben"
> The catch is that code sould find only exact Ben words not also words
> which that has firts three letters Ben such as Benquick, Benseek etc..
> Only Ben is the right word we are looking for.
The second one was easier...
import re
def myfind(pattern, text):
pattern = r'(.*?)\b(%s)\b(.*)' % pattern
m = re.match(pattern, text)
if m:
return len(m.group(1))
textfound = 'This is a Ben test string'
texttrick = 'This is a Benquick Benseek McBen QuickBenSeek string'
textnotfound = 'He is away'
textmulti = 'Our Ben found another Ben which is quite odd'
pat = 'Ben'
print(myfind(pat, textfound)) # 10
print(myfind(pat, texttrick)) # None
print(myfind(pat, textnotfound)) # None
print(myfind(pat, textmulti)) # 4
if you only want to test for existence, simply:
pattern = 'Ben'
if re.match(r'(.*?)\b(%s)\b(.*)' % pattern, text):
pass
> I would really appreciated your advice - code sample / links how above
> can be achieved! if possible I would appreciated solution achieved
> with free of charge module.
Standard library is free of charge, no?
More information about the Tutor
mailing list