[Tutor] re module / separator
Tiago Saboga
tiagosaboga at gmail.com
Wed Jun 24 20:24:17 CEST 2009
Hi!
I am trying to split some lists out of a single text file, and I am
having a hard time. I have reduced the problem to the following one:
text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."
Of this line of text, I want to take out strings where all words start
with a, end with "b.". But I don't want a list of words. I want that:
["a2345b.", "a45453b. a325643b. a435643b."]
And I feel I still don't fully understand regular expression's logic. I
do not understand the results below:
In [33]: re.search("(a[^.]*?b\.\s?){2}", text).group(0)
Out[33]: 'a45453b. a325643b. '
In [34]: re.findall("(a[^.]*?b\.\s?){2}", text)
Out[34]: ['a325643b. ']
In [35]: re.search("(a[^.]*?b\.\s?)+", text).group(0)
Out[35]: 'a2345b. '
In [36]: re.findall("(a[^.]*?b\.\s?)+", text)
Out[36]: ['a2345b. ', 'a435643b. ']
What's the difference between search and findall in [33-34]? And why I
cannot generalize [33] to [35]? Out[35] would make sense to me if I had
put a non-greedy +, but why do re gets only one word?
Thanks,
Tiago Saboga.
More information about the Tutor
mailing list