[Tutor] regex eats even when not hungry
kent37 at tds.net
Fri Feb 16 18:27:54 CET 2007
> I have the following mostly working function to strip the first 4
> digit year out of some text. But a leading space confounds it for
> years starting 20..:
> import re
> def getyear(text):
> s = """(?:.*?(19\d\d)|(20\d\d).*?)"""
> p = re.compile(s,re.IGNORECASE|re.DOTALL) #|re.VERBOSE
> y = p.match(text)
> return y.group(1) or y.group(2)
> return ''
>>>> getyear(' 2002')
>>>> getyear(' 1902')
> A regex of ".*?" means any number of any characters, with a non-greedy
> hunger (so to speak) right?
> Any ideas on what is causing this to fail?
The | character has very low precedence in a regex. You are matching either
- any number of characters followed by 19xx
- 20xx followed by any number of characters
You could use this instead:
But why not use p.search(), which will find the string anywhere without
needing the wildcards? Then your regex could be just
and you return just y.group()
More information about the Tutor