regexp question
Fredrik Lundh
fredrik at pythonware.com
Fri Dec 5 07:41:35 EST 2003
"python_charmer2000" wrote:
> I want to match several regexps against a large body of text. What I
> have so far is similar to this:
>
> re1 = <some regexp>
> re2 = <some regexp>
> re3 = <some regexp>
>
> big_re = re.compile(re1 + '|' + re2 + '|' + re3)
>
> matches = big_re.finditer(file_list)
> for match in matches:
> span = match.span()
> print "matched text =", file_list[span[0]:span[1]]
> print "matched re =", match.re.pattern
>
> Now the "match.re.pattern" is the entire regexp, big_re. But I want
> to print out the portion of the big re that was matched -- was it re1?
> re2? or re3? Is it possible to determine this, or do I have to make
> a second pass through the collection of re's and compare them against
> the "matched text" in order to determine which part of the big_re was
> matched?
you could put each expression inside parentheses, and use the lastindex
attribute to find the subexpression:
import re, string
res = [
"(a+)",
"(b+)",
"(c+)"
]
big_re = re.compile(string.join(res, "|"))
matches = big_re.finditer("abba")
for match in matches:
span = match.span()
print "matched text =", match.group()
print "matched re =", res[match.lastindex-1]
prints
matched text = a
matched re = (a+)
matched text = bb
matched re = (b+)
matched text = a
matched re = (a+)
</F>
More information about the Python-list
mailing list