regexp question

Fredrik Lundh fredrik at pythonware.com
Fri Dec 5 07:41:35 EST 2003


"python_charmer2000" wrote:

> I want to match several regexps against a large body of text.  What I
> have so far is similar to this:
>
> re1 = <some regexp>
> re2 = <some regexp>
> re3 = <some regexp>
>
> big_re = re.compile(re1 + '|' + re2 + '|' + re3)
>
> matches = big_re.finditer(file_list)
> for match in matches:
>     span = match.span()
>     print "matched text =", file_list[span[0]:span[1]]
>     print "matched re =", match.re.pattern
>
> Now the "match.re.pattern" is the entire regexp, big_re.  But I want
> to print out the portion of the big re that was matched -- was it re1?
>  re2?  or re3?  Is it possible to determine this, or do I have to make
> a second pass through the collection of re's and compare them against
> the "matched text" in order to determine which part of the big_re was
> matched?

you could put each expression inside parentheses, and use the lastindex
attribute to find the subexpression:

    import re, string

    res = [
        "(a+)",
        "(b+)",
        "(c+)"
    ]

    big_re = re.compile(string.join(res, "|"))

    matches = big_re.finditer("abba")
    for match in matches:
        span = match.span()
        print "matched text =", match.group()
        print "matched re =", res[match.lastindex-1]

prints

    matched text = a
    matched re = (a+)
    matched text = bb
    matched re = (b+)
    matched text = a
    matched re = (a+)

</F>








More information about the Python-list mailing list