Efficient String Lookup?

Andrew Dalke adalke at mindspring.com
Sun Oct 17 04:22:57 CEST 2004


Chris S. wrote:
> The problem is I want to associate some data with my pattern, as in a 
> dictionary. Basically, my application consists of a number of 
> conditions, represented as strings with wildcards. Associated to each 
> condition is arbitrary data explaining "what I must do".
   ...
> However, I'm uncertain about the efficiency of this approach. I'd like 
> to use regexps, but how would I associate data with each pattern?

One way is with groups.  Make each pattern into a regexp
pattern then concatenate them as
   (pat1)|(pat2)|(pat3)| ... |(patN)

Do the match and find which group has the non-None value.

You may need to tack a "$" on the end of string (in which
case remember to enclose everything in a () so the $ doesn't
affect only the last pattern).

One things to worry about is you can only have 99 groups
in a pattern.

Here's example code.


import re

config_data = [
  ("abc#e#", "Reactor meltdown imminent"),
  ("ab##", "Antimatter containment field breach"),
  ("b####f", "Coffee too strong"),
  ]

as_regexps = ["(%s)" % pattern.replace("#", ".")
                  for (pattern, text) in config_data]

full_regexp = "|".join(as_regexps) + "$"
pat = re.compile(full_regexp)


input_data = [
     "abadb",
     "abcdef",
     "zxc",
     "abcq",
     "b1234f",
     ]

for text in input_data:
     m = pat.match(text)
     if not m:
         print "%s?  That's okay." % (text,)
     else:
         for i, val in enumerate(m.groups()):
             if val is not None:
                print "%s?  We've got a %r warning!" % (text, 
config_data[i][1],)



Here's the output I got when I ran it


abadb?  We've got a 'Antimatter containment field breach' warning!
abcdef?  We've got a 'Reactor meltdown imminent' warning!
zxc?  That's okay.
abcq?  We've got a 'Antimatter containment field breach' warning!
b1234f?  We've got a 'Coffee too strong' warning!


				Andrew
				dalke at dalkescientific.com



More information about the Python-list mailing list