Begineer Question : Global string substitution with re
gherron at islandtraining.com
Mon Sep 22 11:40:06 CEST 2003
On Monday 22 September 2003 02:22 am, peter leonard wrote:
> This is a basic question but I can't figure out what is wron - even after
> reading the documentation. I have a script that normalizes strings. One of
> the steps is to convert all fractions too the tag 'fraction'. For example :
> import re
> line = "This is the first ratio, 170/37, and this is the second 170/37 "
> def normalise(text):
> #Tag fractions
> fraction = r'(\s+\d+\/\d+\s+)'
> regfr = re.compile(fraction)
> text = regfr.sub(" |fraction| ",text)
> #Remove punctuation
> punc = r'\,'
> regpunc = re.compile(punc)
> text = regpunc.sub("",text)
> return text
> print line,"\n"
> print normalise(line),"\n"
> The output from this script is :
> This is the first ratio, 170/37, and this is the second 170/37
> This is the first ratio 170/37 and this is the second |fraction|
> I can't understand why only one of the fractions gets substituted. The
> documentation for sub states that the default argument for sub is 0 which
> means replace all occurences. The output of my script should be :
> This is the first ratio |fraction| and this is the second |fraction|
The problem is that your regular expression ends with "\s+". This means
the digits of the fraction *must* be followed by at least one space,
and the digits of your first fraction are followed by a comma and not
Your re is matching spaces--fraction--spaces. I'd guess that you
don't really want to match spaces on either side of the fraction.
More information about the Python-list