Different number of matches from re.findall and re.split
steve at holdenweb.com
Mon Jan 11 20:11:11 CET 2010
> On Jan 11, 8:44 am, Iain King <iaink... at gmail.com> wrote:
>> On Jan 11, 3:35 pm, Jeremy <jlcon... at gmail.com> wrote:
>>> Hello all,
>>> I am using re.split to separate some text into logical structures.
>>> The trouble is that re.split doesn't find everything while re.findall
>>> does; i.e.:
>>>> found = re.findall('^ 1', line, re.MULTILINE)
>>>> tables = re.split('^ 1', line, re.MULTILINE)
>>> Can someone explain why these two commands are giving different
>>> results? I thought I should have the same number of matches (or maybe
>>> different by 1, but not 6000!)
>> re.split doesn't take re.MULTILINE as a flag: it doesn't take any
>> flags. It does take a maxsplit parameter, which you are passing the
>> value of re.MULTILINE (which happens to be 8 in my implementation).
>> Since your pattern is looking for line starts, and your first line
>> presumably has more splits than the maxsplits you are specifying, your
>> re.split never finds more than 1.
> Yep. Thanks for pointing that out. I guess I just assumed that
> re.split was similar to re.search/match/findall in what it accepted as
> function parameters. I guess I'll have to use a \n instead of a ^ for
Remember you can specify flags inside the pattern itself.
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/
More information about the Python-list