Efficient String Lookup?

Chris S. chrisks at NOSPAM.udel.edu
Sun Oct 17 01:56:07 EDT 2004


Andrew Dalke wrote:

> One way is with groups.  Make each pattern into a regexp
> pattern then concatenate them as
>   (pat1)|(pat2)|(pat3)| ... |(patN)
> 
> Do the match and find which group has the non-None value.
> 
> You may need to tack a "$" on the end of string (in which
> case remember to enclose everything in a () so the $ doesn't
> affect only the last pattern).
> 
> One things to worry about is you can only have 99 groups
> in a pattern.
> 
> Here's example code.
> 
> 
> import re
> 
> config_data = [
>  ("abc#e#", "Reactor meltdown imminent"),
>  ("ab##", "Antimatter containment field breach"),
>  ("b####f", "Coffee too strong"),
>  ]
> 
> as_regexps = ["(%s)" % pattern.replace("#", ".")
>                  for (pattern, text) in config_data]
> 
> full_regexp = "|".join(as_regexps) + "$"
> pat = re.compile(full_regexp)
> 
> 
> input_data = [
>     "abadb",
>     "abcdef",
>     "zxc",
>     "abcq",
>     "b1234f",
>     ]
> 
> for text in input_data:
>     m = pat.match(text)
>     if not m:
>         print "%s?  That's okay." % (text,)
>     else:
>         for i, val in enumerate(m.groups()):
>             if val is not None:
>                print "%s?  We've got a %r warning!" % (text, 
> config_data[i][1],)
> 
> 
> 
> Here's the output I got when I ran it
> 
> 
> abadb?  We've got a 'Antimatter containment field breach' warning!
> abcdef?  We've got a 'Reactor meltdown imminent' warning!
> zxc?  That's okay.
> abcq?  We've got a 'Antimatter containment field breach' warning!
> b1234f?  We've got a 'Coffee too strong' warning!

Thanks, that's almost exactly what I'm looking for. The only downside I 
see is that I still need to add and remove patterns, so continually 
recompiling the expression might be expensive.



More information about the Python-list mailing list