[Tutor] re module / separator

Wed Jun 24 20:24:17 CEST 2009

Hi!

I am trying to split some lists out of a single text file, and I am
having a hard time. I have reduced the problem to the following one:

text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."

Of this line of text, I want to take out strings where all words start
with a, end with "b.". But I don't want a list of words. I want that:

["a2345b.", "a45453b. a325643b. a435643b."]

And I feel I still don't fully understand regular expression's logic. I
do not understand the results below:

In [33]: re.search("(a[^.]*?b\.\s?){2}", text).group(0)
Out[33]: 'a45453b. a325643b. '

In [34]: re.findall("(a[^.]*?b\.\s?){2}", text)
Out[34]: ['a325643b. ']

In [35]: re.search("(a[^.]*?b\.\s?)+", text).group(0)
Out[35]: 'a2345b. '

In [36]: re.findall("(a[^.]*?b\.\s?)+", text)
Out[36]: ['a2345b. ', 'a435643b. ']

What's the difference between search and findall in [33-34]? And why I
cannot generalize [33] to [35]? Out[35] would make sense to me if I had
put a non-greedy +, but why do re gets only one word?

Thanks,

Tiago Saboga.