Begineer Question : Global string substitution with re

Gary Herron gherron at islandtraining.com
Mon Sep 22 05:40:06 EDT 2003


On Monday 22 September 2003 02:22 am, peter leonard wrote:
> Hi,
> This is a basic question but I can't figure out what is wron - even after
> reading the documentation. I have a script that normalizes strings. One of
> the steps is to convert all fractions too the tag 'fraction'. For example :
>
> import re
> line = "This is the first ratio, 170/37, and this is the second  170/37 "
>
>
> def normalise(text):
>
>     #Tag fractions
>     fraction = r'(\s+\d+\/\d+\s+)'
>     regfr = re.compile(fraction)
>     text = regfr.sub(" |fraction| ",text)
>
>     #Remove punctuation
>     punc = r'\,'
>     regpunc = re.compile(punc)
>     text = regpunc.sub("",text)
>
>     return text
>
> print line,"\n"
> print normalise(line),"\n"
>
>
> The output from this script is :
>
> This is the first ratio, 170/37, and this is the second  170/37
>
> This is the first ratio 170/37 and this is the second |fraction|
>
>
> I can't understand why only one of the fractions gets substituted. The
> documentation for sub states that the default argument for sub is 0 which
> means replace all occurences. The output of my script should be :
>
> This is the first ratio |fraction| and this is the second |fraction|

The problem is that your regular expression ends with "\s+".  This means
the digits of the fraction *must* be followed by at least one space,
and the digits of your first fraction are followed by a comma and not
a space.

Your re is matching spaces--fraction--spaces.  I'd guess that you
don't really want to match spaces on either side of the fraction.

Gary Herron







More information about the Python-list mailing list