Different number of matches from re.findall and re.split

Mon Jan 11 10:44:45 EST 2010

On Jan 11, 3:35 pm, Jeremy <jlcon... at gmail.com> wrote:
> Hello all,
>
> I am using re.split to separate some text into logical structures.
> The trouble is that re.split doesn't find everything while re.findall
> does; i.e.:
>
>
>
> > found = re.findall('^ 1', line, re.MULTILINE)
> > len(found)
>    6439
> > tables = re.split('^ 1', line, re.MULTILINE)
> > len(tables)
> > 1
>
> Can someone explain why these two commands are giving different
> results?  I thought I should have the same number of matches (or maybe
> different by 1, but not 6000!)
>
> Thanks,
> Jeremy

re.split doesn't take re.MULTILINE as a flag: it doesn't take any
flags. It does take a maxsplit parameter, which you are passing the
value of re.MULTILINE (which happens to be 8 in my implementation).
Since your pattern is looking for line starts, and your first line
presumably has more splits than the maxsplits you are specifying, your
re.split never finds more than 1.

>>> a
'split(pattern, string, maxsplit=0)\n    Split the source string by
the occurren
ces of the pattern,\n    returning a list containing the resulting
substrings.\n
'
>>> re.split(" ", a, re.MULTILINE)
['split(pattern,', 'string,', 'maxsplit=0)\n', '', '', '', 'Split',
'the', 'sour
ce string by the occurrences of the pattern,\n    returning a list
containing th
e resulting substrings.\n']
>>> re.split(" ", a)
['split(pattern,', 'string,', 'maxsplit=0)\n', '', '', '', 'Split',
'the', 'sour
ce', 'string', 'by', 'the', 'occurrences', 'of', 'the', 'pattern,\n',
'', '', ''
, 'returning', 'a', 'list', 'containing', 'the', 'resulting',
'substrings.\n']

Iain