"Zeroing out" the Nth group in a RE
Doru-Catalin Togea
doru-cat at ifi.uio.no
Sat Aug 3 11:41:18 EDT 2002
Hi all!
I want to match all occurences of Bible references in a string like:
refs = 'gen 5:17 - 23 , lev 14:20, rev 19:10 - 25'
There are two kinds of Bible references:
- simple, like 'lev 14:20'
- a range, like 'gen 5:17-23' # an extension of the simple
referance
I want a very general RE which matches all referances, both simple and
ranges at once. Running the following code
bibleRef =
re.compile(r'(?:(?:(\w+)(?:\s+)(\d+):(\d+))(?:(?:\s*)(?:-)(?:\s*)(\d+))?)')
m = bibleRef.findall(refs)
print m
outputs:
[('gen', '5', '17', '23'), ('lev', '14', '20', '23'), ('rev', '19', '10',
'25')]
which is "mistaken" in that the second tuple should have been
('lev', '14', '20') or
('lev', '14', '20', '')
I tried to achieve this by grouping the last part of my RE, (the part
denotind the range extension), in a set of (), and by placing an '?' after
that, to say that this part is optional, that is, do match whether it
occurs or not.
So, how do I zero-out this "fourth group", when I encounter simple
referances?
Thank you if you can help.
Catalin
<<<< ================================== >>>>
<< We are what we repeatedly do. >>
<< Excellence, therefore, is not an act >>
<< but a habit. >>
<<<< ================================== >>>>
More information about the Python-list
mailing list