I don't understand this regex.groups() behaviour

Michael Chermside mcherm at mcherm.com
Mon Jun 16 20:27:11 CEST 2003


Grzegorz Adam Hankiewicz writes:
> I don't understand why the last two sentences of the following
> interactive session don't return more than two groups.
> 
> Python 2.2.3 (#1, Jun  5 2003, 14:02:17)
> Type "copyright", "credits" or "license" for more information.
> 
> In [1]: import re
> 
> In [2]: c = '"A";"AA";"AAA";"AAAA";"AAAAA"'
> 
> In [3]: re.match(r'("A+?";)("A+?"$)', c)
> 
> In [4]: re.match(r'("A+?";)+("A+?"$)', c).groups()
> Out[4]: ('"AAAA";', '"AAAAA"')
> 
> In [5]: re.match(r'("A+?";){4}("A+?"$)', c).groups()
> Out[5]: ('"AAAA";', '"AAAAA"')
> 
> Could somebody please explain why multiple groups aren't returned?

First of all, multiple groups ARE returned. Specifically, Out[5]
is a list of *two* strings. But it's not what you want. .groups()
will return one group for every non-escaped begin-paren in your
regular expression[1]. But from the looks of your example, what
you REALLY want is to take a regular expression and find all the
places where it matches. The function you want is called "findall",
and the syntax for reads like this:


>>> import re
>>> c = '"A";"AA";"AAA";"AAAA";"AAAAA"'
>>> re.findall(r'("A+?";?)', c)
['"A";', '"AA";', '"AAA";', '"AAAA";', '"AAAAA"']

I have simplified your regular expression somewhat so that I no
longer require the ; between all fields, but I expect that after
you realize that findall is what you want, you'll be able to
straighten out the details fairly easily.

-- Michael Chermside






More information about the Python-list mailing list