trailing space in RE

Hans Nowak wurmy at earthlink.net
Fri Aug 2 20:43:10 EDT 2002


Doru-Catalin Togea wrote:

> I have written a little script to parse some Bible text, and to this
> purpose I defined the following re:
> 	
> 	bibleRef = r'(\w+) (\d+):(\d+) (.+)'
> 
> Everything works fine, but I have a problem in that "the rest of the
> text" allways has a trailing space like this:
> 	
> "Gen 1:1 In the beginning God created the heavens and the earth. "
> "1Co 10:12 Therefore let him who thinks he stands take heed lest he
> fall. "
>  
> So my question is, how do I match "the rest of the text" but not the last
> character (which is a space)?

In addition to the solutions already proposed, you can do the following:

 >>> import re
 >>> bibleref = r"(\w+) (\d+):(\d+) (.+?) *$"
 >>> s = "Foo 1:1 In the beginning Guido created Python.    "
 >>> m = re.search(bibleref, s)
 >>> m.groups()
('Foo', '1', '1', 'In the beginning Guido created Python.')

The "(.+?") matches non-greedily, and the " *$" matches any spaces after that, 
so they're not included in the group. ...You could also use "\s*$", so all 
whitespace is matched, including newlines...

HTH,

-- 
Hans (base64.decodestring('d3VybXlAZWFydGhsaW5rLm5ldA=='))
# decode for email address ;-)
The Pythonic Quarter:: http://www.awaretek.com/nowak/




More information about the Python-list mailing list