[Tutor] regex eats even when not hungry

Thomas tavspam at gmail.com
Fri Feb 16 18:14:53 CET 2007


I have the following mostly working function to strip the first 4
digit year out of some text. But a leading space confounds it for
years starting 20..:

import re
def getyear(text):
    s = """(?:.*?(19\d\d)|(20\d\d).*?)"""
    p = re.compile(s,re.IGNORECASE|re.DOTALL) #|re.VERBOSE
    y = p.match(text)
    try:
        return y.group(1) or y.group(2)
    except:
        return ''



>>> getyear('2002')
'2002'
>>> getyear(' 2002')
''
>>> getyear(' 1902')
'1902'

A regex of ".*?" means any number of any characters, with a non-greedy
hunger (so to speak) right?

Any ideas on what is causing this to fail?

Many thanks in advance,
Thomas


More information about the Tutor mailing list