regex alternation problem
ptmcg at austin.rr.com
Sat Apr 18 00:28:39 CEST 2009
On Apr 17, 4:49 pm, Jesse Aldridge <JesseAldri... at gmail.com> wrote:
> import re
> s1 = "I am an american"
> s2 = "I am american an "
> for s in [s1, s2]:
> print re.findall(" (am|an) ", s)
> # Results:
> # ['am']
> # ['am', 'an']
> I want the results to be the same for each string. What am I doing
Does it help if you expand your RE to its full expression, with '_'s
where the blanks go:
"_am_" or "_an_"
Now look for these in "I_am_an_american". After the first "_am_" is
processed, findall picks up at the leading 'a' of 'an', and there is
no leading blank, so no match. If you search through
"I_am_american_an_", both "am" and "an" have surrounding spaces, so
Instead of using explicit spaces, try using '\b' meaning word break:
>>> import re
>>> re.findall(r"\b(am|an)\b", "I am an american")
>>> re.findall(r"\b(am|an)\b", "I am american an")
Your find pattern includes (and consumes) a leading AND trailing space
around each word. In the first string "I am an american", there is a
leading and trailing space around "am", but the trailing space for
"am" is the leading space for "an", so " an "
More information about the Python-list