Efficient String Lookup?
Chris S.
chrisks at NOSPAM.udel.edu
Sun Oct 17 01:56:07 EDT 2004
Andrew Dalke wrote:
> One way is with groups. Make each pattern into a regexp
> pattern then concatenate them as
> (pat1)|(pat2)|(pat3)| ... |(patN)
>
> Do the match and find which group has the non-None value.
>
> You may need to tack a "$" on the end of string (in which
> case remember to enclose everything in a () so the $ doesn't
> affect only the last pattern).
>
> One things to worry about is you can only have 99 groups
> in a pattern.
>
> Here's example code.
>
>
> import re
>
> config_data = [
> ("abc#e#", "Reactor meltdown imminent"),
> ("ab##", "Antimatter containment field breach"),
> ("b####f", "Coffee too strong"),
> ]
>
> as_regexps = ["(%s)" % pattern.replace("#", ".")
> for (pattern, text) in config_data]
>
> full_regexp = "|".join(as_regexps) + "$"
> pat = re.compile(full_regexp)
>
>
> input_data = [
> "abadb",
> "abcdef",
> "zxc",
> "abcq",
> "b1234f",
> ]
>
> for text in input_data:
> m = pat.match(text)
> if not m:
> print "%s? That's okay." % (text,)
> else:
> for i, val in enumerate(m.groups()):
> if val is not None:
> print "%s? We've got a %r warning!" % (text,
> config_data[i][1],)
>
>
>
> Here's the output I got when I ran it
>
>
> abadb? We've got a 'Antimatter containment field breach' warning!
> abcdef? We've got a 'Reactor meltdown imminent' warning!
> zxc? That's okay.
> abcq? We've got a 'Antimatter containment field breach' warning!
> b1234f? We've got a 'Coffee too strong' warning!
Thanks, that's almost exactly what I'm looking for. The only downside I
see is that I still need to add and remove patterns, so continually
recompiling the expression might be expensive.
More information about the Python-list
mailing list