How to insert string in each match using RegEx iterator
Peter Otten
__peter__ at web.de
Wed Jun 10 11:13:50 EDT 2009
504crank at gmail.com wrote:
> I wonder if you (or anyone else) might attempt a different explanation
> for the use of the special sequence '\1' in the RegEx syntax.
>
> The Python documentation explains:
>
> \number
> Matches the contents of the group of the same number. Groups are
> numbered starting from 1. For example, (.+) \1 matches 'the the' or
> '55 55', but not 'the end' (note the space after the group). This
> special sequence can only be used to match one of the first 99 groups.
> If the first digit of number is 0, or number is 3 octal digits long,
> it will not be interpreted as a group match, but as the character with
> octal value number. Inside the '[' and ']' of a character class, all
> numeric escapes are treated as characters.
>
> In practice, this appears to be the key to the key device to your
> clever solution:
>
>>>> re.compile(r"(\d+)").sub(r"INSERT \1", string)
>
> 'abc INSERT 123 def INSERT 456 ghi INSERT 789'
>
>>>> re.compile(r"(\d+)").sub(r"INSERT ", string)
>
> 'abc INSERT def INSERT ghi INSERT '
>
> I don't, however, precisely understand what is meant by "the group of
> the same number" -- or maybe I do, but it isn't explicit. Is this just
> a shorthand reference to match.group(1) -- if that were valid --
> implying that the group match result is printed in the compile
> execution?
If I understand you correctly you are right. Another example:
>>> re.compile(r"([a-z]+)(\d+)").sub(r"number=\2 word=\1", "a1 zzz42")
'number=1 word=a number=42 word=zzz'
For every match of "[a-z]+\d+" in the original string "\1" in
"number=\2 word=\1" is replaced with the actual match for "[a-z]+" and
"\2" is replaced with the actual match for "\d+".
The result, e. g. "number=1 word=a", is then used to replace the actual
match for group 0, i. e. "a1" in the example.
Peter
More information about the Python-list
mailing list