How do I get to *all* of the groups of an re search?

Harvey Thomas hst at empolis.co.uk
Fri Jan 10 10:01:49 EST 2003


Cameron Laird wrote:
> 
> In article <sl51f-6dj.ln1 at news.lairds.org>,
> Kyler Laird  <Kyler at news.Lairds.org> wrote:
> >	http://www.python.org/doc/current/lib/re-syntax.html
> >	(...)
> >	    Matches whatever regular expression is inside the
> >	    parentheses, and indicates the start and end of a 
> >	    group; the contents of a group can be retrieved
> >	    after a match has been performed, [...]
> >
> >Sounds good, so I tried it.
> >
> >	import re
> >
> >	text = 'foo foo1 foo2 bar bar1 bar2 bar3'
> >
> >	test_re = re.compile('([a-z]+)( \\1[0-9]+)+')
> >
> >	print test_re.findall(text)
> >
> >I expected the matches to be something like
> >	[('foo', [' foo1', ' foo2']), ('bar', [' bar1', ' 
> bar2', ' bar3'])]
> >but it's just this.
> >	[('foo', ' foo2'), ('bar', ' bar3')]
> >
> >How do I get to the other groups that were matched?  (Is this
> >an FAQ?  I don't know where to start looking.)
> 			.
> 			.
> 			.
> Oh, it's matching all the groups.  Does the code below help
> explain why?
> 
> I'm clumsy with REs--I don't immediately see how to achieve
> your desired result.  I can quickly observe that
>   import re
> 
>   text = 'foo foo1 foo2 bar bar1 bar2 bar3'
> 
>   test_re = re.compile('([a-z]+)(( \\1[0-9]+)+)')
> 
>   print test_re.findall(text) 
> yields
>   [('foo', ' foo1 foo2', ' foo2'), ('bar', ' bar1 bar2 bar3', 
> ' bar3')]
> One of us will probably get an RE that properly listifies these
> within the next day ...
> -- 
> 
You can't return a variable number of groups from a regex. The number of groups returned is always the number of (capturing) groups in the regex. However,

import re
t = 'foo foo1 foo2 bar bar1 bar2 bar3 singleton'
e = re.compile('([a-z]+)((?: +\\1[0-9]+)*)')
print [[x[0]] + x[1].split() for x in e.findall(t)]

yields

[['foo', 'foo1', 'foo2'], ['bar', 'bar1', 'bar2', 'bar3'], ['singleton']]

which seems pretty close to what you want.

HTH

Harvey

_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.





More information about the Python-list mailing list