[Tutor] regex eats even when not hungry
Thomas
tavspam at gmail.com
Fri Feb 16 18:14:53 CET 2007
I have the following mostly working function to strip the first 4
digit year out of some text. But a leading space confounds it for
years starting 20..:
import re
def getyear(text):
s = """(?:.*?(19\d\d)|(20\d\d).*?)"""
p = re.compile(s,re.IGNORECASE|re.DOTALL) #|re.VERBOSE
y = p.match(text)
try:
return y.group(1) or y.group(2)
except:
return ''
>>> getyear('2002')
'2002'
>>> getyear(' 2002')
''
>>> getyear(' 1902')
'1902'
A regex of ".*?" means any number of any characters, with a non-greedy
hunger (so to speak) right?
Any ideas on what is causing this to fail?
Many thanks in advance,
Thomas
More information about the Tutor
mailing list