a regexp riddle: re.search(r'

Yingjie Lan lanyjie at yahoo.com
Thu Nov 25 04:44:44 EST 2010


--- On Thu, 11/25/10, Phlip <phlip2005 at gmail.com> wrote:
> From: Phlip <phlip2005 at gmail.com>
> Subject: a regexp riddle: re.search(r'
> To: python-list at python.org
> Date: Thursday, November 25, 2010, 8:46 AM
> HypoNt:
> 
> I need to turn a human-readable list into a list():
> 
>    print re.search(r'(?:(\w+), |and
> (\w+))+', 'whatever a, bbb, and
> c').groups()
> 
> That currently returns ('c',). I'm trying to match "any
> word \w+
> followed by a comma, or a final word preceded by and."
> 
> The match returns 'a, bbb, and c', but the groups return
> ('bbb', 'c').
> What do I type for .groups() to also get the 'a'?
> 

First of all, the 'bbb' coresponds to the first capturing
group and 'c' the second. But 'a' is forgotten be cause
it was the first match of the first group, but there
is a second match 'bbb'.

Generally, a capturing group only remembers the last match.

It also seems that your re may match this: 'and c',
which does not seem to be your intention.
So it may be more intuitively written as:

r'(?:(\w+), )+and (\w+)'

I'm not sure how to get it done in one step,
but it would be easy to first get the whole 
match, then process it with:

re.findall(r'(\w+)(?:,|$)', the_whole_match)

cheers,

Yingjie



      



More information about the Python-list mailing list