[Tutor] Regular expressions: findall vs search

Tue Jul 10 21:45:32 CEST 2012

On Tue, Jul 10, 2012 at 9:26 PM, Alexander Q. <redacted@example.com> wrote:

> I'm a bit confused about extracting data using re.search or re.findall.
>
> Say I have the following code: tuples =
> re.findall(r'blahblah(\d+)yattayattayatta(\w+)moreblahblahblah(\w+)over',
> text)
>
> So I'm looking for that string in 'text', and I intend to extract the
> parts which have parentheses around them. And it works: the variable
> "tuples", which I assigned to get the return of re.findall, returns a tuple
> list, each 'element' therein being a tuple of 3 elements (which is what I
> wanted since I had 3 sets of parentheses).
>
> My question is how does Python know to return just the part in the
> parentheses and not to return the "blahblah" and the "yattayattayatta",
> etc...? The 're.search' function returns the whole thing, and if I want
> just the parentheses parts, I do tuples.group(1) or tuples.group(2) or
> tuples.group(3), depending on which set of parentheses I want. Does the
> re.findall command by default ignore anything outside of the parentheses
> and only return the parentheses as a grouping withing one tuple (i.e., the
> first element in "tuples" would be, as it is, a list comprised of 3
> elements corresponding respectively to the 1st, 2nd, and 3rd parentheses)?
> Thank you for reading.
>
> -Alex
>

from the documentation for findall:

The *string* is scanned left-to-right, and matches are returned in the
order found. If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern has more than
one group.

That should clear everything up. As for *why* it behaves this way, I have
no idea. It may be legacy behavior.

HTH,
Hugo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120710/6efa03e8/attachment.html>