one more question on regex
mg
noOne at nowhere.com
Fri Jan 22 17:47:16 EST 2016
Il Fri, 22 Jan 2016 21:10:44 +0100, Vlastimil Brom ha scritto:
> 2016-01-22 16:50 GMT+01:00 mg <noOne at nowhere.com>:
>> Il Fri, 22 Jan 2016 15:32:57 +0000, mg ha scritto:
>>
>>> python 3.4.3
>>>
>>> import re re.search('(ab){2}','abzzabab')
>>> <_sre.SRE_Match object; span=(4, 8), match='abab'>
>>>
>>>>>> re.findall('(ab){2}','abzzabab')
>>> ['ab']
>>>
>>> Why for search() the match is 'abab' and for findall the match is
>>> 'ab'?
>>
>> finditer seems to be consistent with search:
>> regex = re.compile('(ab){2}')
>>
>> for match in regex.finditer('abzzababab'):
>> print ("%s: %s" % (match.start(), match.span() ))
>> ...
>> 4: (4, 8)
>>
>> -- https://mail.python.org/mailman/listinfo/python-list
>
> Hi,
> as was already pointed out, findall "collects" the content of the
> capturing groups (if present), rather than the whole matching text;
>
> for repeated captures the last content of them is taken discarding the
> previous ones; cf.:
>
>>>> re.findall('(?i)(a)x(b)+','axbB')
> [('a', 'B')]
>>>>
> (for multiple capturing groups in the pattern, a tuple of captured parts
> are collected)
>
> or with your example with differenciated parts of the string using
> upper/lower case:
>>>> re.findall('(?i)(ab){2}','aBzzAbAB')
> ['AB']
>>>>
>>>>
> hth,
> vbr
You explanation of re.findall() results is correct. My point is that the
documentation states:
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of
strings
and this is not what re.findall does. IMHO it should be more reasonable
to get back the whole matches, since this seems to me the most useful
information for the user. In any case I'll go with finditer, that returns
in match object all the infos that anyone can look for.
More information about the Python-list
mailing list