[Tutor] re module- puzzling results when matching money

Alex Kleider akleider at sonic.net
Sat Aug 3 20:15:28 CEST 2013


#!/usr/bin/env python

"""
I've been puzzling over the re module and have a couple of questions
regarding the behaviour of this script.

I've provided two possible patterns (re_US_money):
the one surrounded by the 'word boundary' meta sequence seems not to 
work
while the other one does. I can't understand why the addition of the 
word
boundary defeats the match.

I also don't understand why the split method includes the matched text.
Splitting only works as I would have expected if no goupings are used.

If I've set this up as intended, the full body of this e-mail should be
executable as a script.

Comments appreciated.
alex kleider
"""

# file :  tutor.py (Python 2.7, NOT Python 3)
print 'Running "tutor.py" on an Ubuntu Linux machine. *********'

import re

target = \
"""Cost is $4.50. With a $.30 discount:
Price is $4.15.
The price could be less, say $4 or $4.
Let's see how this plays out:  $4.50.60
"""

# Choose one of the following two alternatives:
re_US_money =\
r"((?P<sign>\$)(?P<dollars>\d{0,})(?:\.(?P<cents>\d{2})){0,1})"
# The above provides matches.
# The following does NOT.
# re_US_money =\
# r"\b((?P<sign>\$)(?P<dollars>\d{0,})(?:\.(?P<cents>\d{2})){0,1})\b"

pat_object = re.compile(re_US_money)
match_object = pat_object.search(target)
if match_object:
     print "'match_object.group()' and 'match_object.span()' yield:"
     print match_object.group(), match_object.span()
     print
else:
     print "NO MATCH FOUND!!!"
print
print "Now will use 'finditer()':"

print
iterator = pat_object.finditer(target)
i = 1
for iter in iterator:
     print
     print "iter #%d: "%(i, ),
     print iter.group()
     print "'groups()' yields: '%s'."%(iter.groups(), )
     print iter.span()
     i += 1
     sign = iter.group("sign")
     dollars = iter.group("dollars")
     cents = iter.group("cents")
     print sign,
     print "  ",
     if dollars:
         print dollars,
     else:
         print "00",
     print "  ",
     if cents:
         print cents,
     else:
         print "00",

print

t = target
sub_target = pat_object.sub("<insert value here>", t)
print
print "Printing substitution: "
print sub_target
split_target = pat_object.split(target)
print "Result of splitting on the target: "
print split_target

# End of script.


More information about the Tutor mailing list