template strings for matching?
MRAB
google at mrabarnett.plus.com
Thu Oct 9 17:53:12 EDT 2008
On Oct 9, 5:20 pm, Joe Strout <j... at strout.net> wrote:
> Wow, this was harder than I thought (at least for a rusty Pythoneer
> like myself). Here's my stab at an implementation. Remember, the
> goal is to add a "match" method to Template which works like
> Template.substitute, but in reverse: given a string, if that string
> matches the template, then it should return a dictionary mapping each
> template field to the corresponding value in the given string.
>
> Oh, and as one extra feature, I want to support a ".greedy" attribute
> on the Template object, which determines whether the matching of
> fields should be done in a greedy or non-greedy manner.
>
> ------------------------------------------------------------
> #!/usr/bin/python
>
> from string import Template
> import re
>
> def templateMatch(self, s):
> # start by finding the fields in our template, and building a map
> # from field position (index) to field name.
> posToName = {}
> pos = 1
> for item in self.pattern.findall(self.template):
> # each item is a tuple where item 1 is the field name
> posToName[pos] = item[1]
> pos += 1
>
> # determine if we should match greedy or non-greedy
> greedy = False
> if self.__dict__.has_key('greedy'):
> greedy = self.greedy
>
> # now, build a regex pattern to compare against s
> # (taking care to escape any characters in our template that
> # would have special meaning in regex)
> pat = self.template.replace('.', '\\.')
> pat = pat.replace('(', '\\(')
> pat = pat.replace(')', '\\)') # there must be a better way...
>
> if greedy:
> pat = self.pattern.sub('(.*)', pat)
> else:
> pat = self.pattern.sub('(.*?)', pat)
> p = re.compile(pat)
>
> # try to match this to the given string
> match = p.match(s)
> if match is None: return None
> out = {}
> for i in posToName.keys():
> out[posToName[i]] = match.group(i)
> return out
>
> Template.match = templateMatch
>
> t = Template("The $object in $location falls mainly in the $subloc.")
> print t.match( "The rain in Spain falls mainly in the train." )
> ------------------------------------------------------------
>
> This sort-of works, but it won't properly handle $$ in the template,
> and I'm not too sure whether it handles the ${fieldname} form,
> either. Also, it only escapes '.', '(', and ')' in the template...
> there must be a better way of escaping all characters that have
> special meaning to RegEx, except for '$' (which is why I can't use
> re.escape).
>
> Probably the rest of the code could be improved too. I'm eager to
> hear your feedback.
>
> Thanks,
> - Joe
How about something like:
import re
def placeholder(m):
if m.group(1):
return "(?P<%s>.+)" % m.group(1)
elif m.group(2):
return "\\$"
else:
return re.escape(m.group(3))
regex = re.compile(r"\$(\w+)|(\$\$)")
t = "The $object in $location falls mainly in the $subloc."
print regex.sub(placeholder, t)
More information about the Python-list
mailing list