Bug? concatenate a number to a backreference: re.sub(r'(zzz:)xxx', r'\1'+str(4444), somevar)

Peter Otten __peter__ at web.de
Fri Oct 23 13:54:39 CEST 2009


abdulet wrote:

> Well its this normal? i want to concatenate a number to a
> backreference in a regular expression. Im working in a multprocess
> script so the first what i think is in an error in the multiprocess
> logic but what a sorprise!!! when arrived to this conclussion after
> some time debugging i see that:
> 
> import re
> aa = "zzz:xxx"
> re.sub(r'(zzz:).*',r'\1'+str(3333),aa)
> '[33'

If you perform the addition you get r"\13333". How should the regular 
expression engine interpret that? As the backreference to group 1, 13, ... 
or 13333? It picks something completely different, "[33", because "\133" is 
the octal escape sequence for "[":

>>> chr(0133)
'['

You can avoid the ambiguity with

extra = str(number)
extra = re.escape(extra) 
re.sub(expr r"\g<1>" + extra, text)

The re.escape() step is not necessary here, but a good idea in the general 
case when extra is an arbitrary string.

Peter




More information about the Python-list mailing list