trailing space in RE
Hans Nowak
wurmy at earthlink.net
Fri Aug 2 20:43:10 EDT 2002
Doru-Catalin Togea wrote:
> I have written a little script to parse some Bible text, and to this
> purpose I defined the following re:
>
> bibleRef = r'(\w+) (\d+):(\d+) (.+)'
>
> Everything works fine, but I have a problem in that "the rest of the
> text" allways has a trailing space like this:
>
> "Gen 1:1 In the beginning God created the heavens and the earth. "
> "1Co 10:12 Therefore let him who thinks he stands take heed lest he
> fall. "
>
> So my question is, how do I match "the rest of the text" but not the last
> character (which is a space)?
In addition to the solutions already proposed, you can do the following:
>>> import re
>>> bibleref = r"(\w+) (\d+):(\d+) (.+?) *$"
>>> s = "Foo 1:1 In the beginning Guido created Python. "
>>> m = re.search(bibleref, s)
>>> m.groups()
('Foo', '1', '1', 'In the beginning Guido created Python.')
The "(.+?") matches non-greedily, and the " *$" matches any spaces after that,
so they're not included in the group. ...You could also use "\s*$", so all
whitespace is matched, including newlines...
HTH,
--
Hans (base64.decodestring('d3VybXlAZWFydGhsaW5rLm5ldA=='))
# decode for email address ;-)
The Pythonic Quarter:: http://www.awaretek.com/nowak/
More information about the Python-list
mailing list