regular expresions and dolar sign ($) in source string
Peter Otten
__peter__ at web.de
Thu Apr 23 03:13:17 EDT 2009
Jax wrote:
> I encountered problem with dolar sign in source string. It seems that $
> require special threatening. Below is copy of session with interactive
> Python's shell:
>
> Python 2.5.2 (r252:60911, Jan 8 2009, 12:17:37)
> [GCC 4.3.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import re
>>>> a = unicode(r"(instead of $399.99)", "utf8")
>>>> print re.search(unicode(r"^\(instead of.*(\d+[.]\d+)\)$", "utf8"),
> a).group(1)
> 9.99
>>>> print re.search(unicode(r"^\(.*(\d+[.]\d+)\)$", "utf8"), a).group(1)
> 9.99
>>>> print re.search(unicode(r"^\(.*\$(\d+[.]\d+)\)$", "utf8"), a).group(1)
> 399.99
>
> My question is: Why only third regular expression is correct?
They are all correct, they just don't give what you expect. This has nothing
to do with the $. The ".*" expression is "greedy", it tries to match as
many characters as possible. You can see that by adding another group:
>>> a = u"(instead of $399.99)"
>>> re.search(ur"^\(instead of(.*)(\d+[.]\d+)\)$", a).groups()
(u' $39', u'9.99')
Fortunately there is also a non-greedy variant ".*?" which matches as few
characters as possible:
>>> a = u"(instead of $399.99)"
>>> re.search(ur"^\(instead of.*?(\d+[.]\d+)\)$", a).group(1)
u'399.99'
Peter
More information about the Python-list
mailing list