How to write this repeat matching?
rxjwg98 at gmail.com
rxjwg98 at gmail.com
Mon Jul 7 09:30:47 EDT 2014
On Sunday, July 6, 2014 3:26:44 PM UTC-4, Ian wrote:
> On Sun, Jul 6, 2014 at 12:57 PM, <rxjwg98 at gmail.com> wrote:
>
> > I write the following code:
>
> >
>
> > .......
>
> > import re
>
> >
>
> > line = "abcdb"
>
> >
>
> > matchObj = re.match( 'a[bcd]*b', line)
>
> >
>
> > if matchObj:
>
> > print "matchObj.group() : ", matchObj.group()
>
> > print "matchObj.group(0) : ", matchObj.group()
>
> > print "matchObj.group(1) : ", matchObj.group(1)
>
> > print "matchObj.group(2) : ", matchObj.group(2)
>
> > else:
>
> > print "No match!!"
>
> > .........
>
> >
>
> > In which I have used its match pattern, but the result is not 'abcb'
>
>
>
> You're never going to get a match of 'abcb' on that string, because
>
> 'abcb' is not found anywhere in that string.
>
>
>
> There are two possible matches for the given pattern over that string:
>
> 'abcdb' and 'ab'. The first one matches the [bcd]* three times, and
>
> the second one matches it zero times. Because the matching is greedy,
>
> you get the result that matches three times. It cannot match one, two
>
> or four times because then there would be no 'b' following the [bcd]*
>
> portion as required by the pattern.
>
>
>
> >
>
> > Only matchObj.group(0): abcdb
>
> >
>
> > displays. All other group(s) have no content.
>
>
>
> Calling match.group(0) is equivalent to calling match.group without
>
> arguments. In that case it returns the matched string of the entire
>
> regular expression. match.group(1) and match.group(2) will return the
>
> value of the first and second matching group respectively, but the
>
> pattern does not have any matching groups. If you want a matching
>
> group, then enclose the part that you want it to match in parentheses.
>
> For example, if you change the pattern to:
>
>
>
> matchObj = re.match('a([bcd]*)b', line)
>
>
>
> then the value of matchObj.group(1) will be 'bcd'
Because I am new to Python, I may not describe the question clearly. Could you
read the original problem on web:
https://docs.python.org/2/howto/regex.html
It says that it gets 'abcb'. Could you explain it to me? Thanks again
A step-by-step example will make this more obvious. Let's consider the
expression a[bcd]*b. This matches the letter 'a', zero or more letters from
the class [bcd], and finally ends with a 'b'. Now imagine matching this RE
against the string abcbd.
Step Matched Explanation
1 a The a in the RE matches.
2 abcbd The engine matches [bcd]*, going as far as it can, which is to the end
of the string.
3 Failure The engine tries to match b, but the current position is at the end
of the string, so it fails.
4 abcb Back up, so that [bcd]* matches one less character.
5 Failure Try b again, but the current position is at the last character, which
is a 'd'.
6 abc Back up again, so that [bcd]* is only matching bc.
6 abcb Try b again. This time the character at the current position is 'b', so
it succeeds.
More information about the Python-list
mailing list