trailing space in RE
wurmy at earthlink.net
Sat Aug 3 02:43:10 CEST 2002
Doru-Catalin Togea wrote:
> I have written a little script to parse some Bible text, and to this
> purpose I defined the following re:
> bibleRef = r'(\w+) (\d+):(\d+) (.+)'
> Everything works fine, but I have a problem in that "the rest of the
> text" allways has a trailing space like this:
> "Gen 1:1 In the beginning God created the heavens and the earth. "
> "1Co 10:12 Therefore let him who thinks he stands take heed lest he
> fall. "
> So my question is, how do I match "the rest of the text" but not the last
> character (which is a space)?
In addition to the solutions already proposed, you can do the following:
>>> import re
>>> bibleref = r"(\w+) (\d+):(\d+) (.+?) *$"
>>> s = "Foo 1:1 In the beginning Guido created Python. "
>>> m = re.search(bibleref, s)
('Foo', '1', '1', 'In the beginning Guido created Python.')
The "(.+?") matches non-greedily, and the " *$" matches any spaces after that,
so they're not included in the group. ...You could also use "\s*$", so all
whitespace is matched, including newlines...
# decode for email address ;-)
The Pythonic Quarter:: http://www.awaretek.com/nowak/
More information about the Python-list