How to insert string in each match using RegEx iterator

Peter Otten __peter__ at web.de
Wed Jun 10 11:13:50 EDT 2009


504crank at gmail.com wrote:

> I wonder if you (or anyone else) might attempt a different explanation
> for the use of the special sequence '\1' in the RegEx syntax.
> 
> The Python documentation explains:
> 
> \number
>     Matches the contents of the group of the same number. Groups are
> numbered starting from 1. For example, (.+) \1 matches 'the the' or
> '55 55', but not 'the end' (note the space after the group). This
> special sequence can only be used to match one of the first 99 groups.
> If the first digit of number is 0, or number is 3 octal digits long,
> it will not be interpreted as a group match, but as the character with
> octal value number. Inside the '[' and ']' of a character class, all
> numeric escapes are treated as characters.
> 
> In practice, this appears to be the key to the key device to your
> clever solution:
> 
>>>> re.compile(r"(\d+)").sub(r"INSERT \1", string)
> 
> 'abc INSERT 123 def INSERT 456 ghi INSERT 789'
> 
>>>> re.compile(r"(\d+)").sub(r"INSERT ", string)
> 
> 'abc INSERT  def INSERT  ghi INSERT '
> 
> I don't, however, precisely understand what is meant by "the group of
> the same number" -- or maybe I do, but it isn't explicit. Is this just
> a shorthand reference to match.group(1) -- if that were valid --
> implying that the group match result is printed in the compile
> execution?

If I understand you correctly you are right. Another example:

>>> re.compile(r"([a-z]+)(\d+)").sub(r"number=\2 word=\1", "a1 zzz42")
'number=1 word=a number=42 word=zzz'

For every match of "[a-z]+\d+" in the original string "\1" in 
"number=\2 word=\1" is replaced with the actual match for "[a-z]+" and 
"\2" is replaced with the actual match for "\d+".

The result, e. g. "number=1 word=a", is then used to replace the actual 
match for group 0, i. e. "a1" in the example.

Peter





More information about the Python-list mailing list