How to insert string in each match using RegEx iterator

504crank at gmail.com 504crank at gmail.com
Thu Jun 11 11:14:13 EDT 2009


On Jun 10, 10:13 am, Peter Otten <__pete... at web.de> wrote:
> 504cr... at gmail.com wrote:
> > I wonder if you (or anyone else) might attempt a different explanation
> > for the use of the special sequence '\1' in the RegEx syntax.
>
> > The Python documentation explains:
>
> > \number
> >     Matches the contents of the group of the same number. Groups are
> > numbered starting from 1. For example, (.+) \1 matches 'the the' or
> > '55 55', but not 'the end' (note the space after the group). This
> > special sequence can only be used to match one of the first 99 groups.
> > If the first digit of number is 0, or number is 3 octal digits long,
> > it will not be interpreted as a group match, but as the character with
> > octal value number. Inside the '[' and ']' of a character class, all
> > numeric escapes are treated as characters.
>
> > In practice, this appears to be the key to the key device to your
> > clever solution:
>
> >>>> re.compile(r"(\d+)").sub(r"INSERT \1", string)
>
> > 'abc INSERT 123 def INSERT 456 ghi INSERT 789'
>
> >>>> re.compile(r"(\d+)").sub(r"INSERT ", string)
>
> > 'abc INSERT  def INSERT  ghi INSERT '
>
> > I don't, however, precisely understand what is meant by "the group of
> > the same number" -- or maybe I do, but it isn't explicit. Is this just
> > a shorthand reference to match.group(1) -- if that were valid --
> > implying that the group match result is printed in the compile
> > execution?
>
> If I understand you correctly you are right. Another example:
>
> >>> re.compile(r"([a-z]+)(\d+)").sub(r"number=\2 word=\1", "a1 zzz42")
>
> 'number=1 word=a number=42 word=zzz'
>
> For every match of "[a-z]+\d+" in the original string "\1" in
> "number=\2 word=\1" is replaced with the actual match for "[a-z]+" and
> "\2" is replaced with the actual match for "\d+".
>
> The result, e. g. "number=1 word=a", is then used to replace the actual
> match for group 0, i. e. "a1" in the example.
>
> Peter- Hide quoted text -
>
> - Show quoted text -

Wow! That is so cool. I had to process it for a little while to get
it.

>>> s = '111bbb333'
>>> re.compile('(\d+)([b]+)(\d+)').sub(r'First string: \1 Second string: \2 Third string: \3', s)
'First string: 111 Second string: bbb Third string: 333'

MRI scans would no doubt reveal that people who attain a mastery of
RegEx expressions must have highly developed areas of the brain. I
wonder where the RegEx part of the brain might be located.

That was a really clever teaching device. I really appreciate you
taking the time to post it, Peter. I'm definitely getting a schooling
on this list.

Thanks!



More information about the Python-list mailing list