[New-bugs-announce] [issue34304] clarification on escaping \d in regular expressions
Saba Kauser
report at bugs.python.org
Wed Aug 1 01:13:39 EDT 2018
New submission from Saba Kauser <skauseribmdb at gmail.com>:
Hello,
I have a program that works well upto python 3.6 but fails with python 3.7.
import re
pattern="DBMS_NAME: string(%d) %s"
sym = ['\[','\]','\(','\)']
for chr in sym:
pattern = re.sub(chr, '\\' + chr, pattern)
print(pattern)
pattern=re.sub('%s','.*?',pattern)
print(pattern)
pattern = re.sub('%d', '\\d+', pattern)
print(pattern)
result=re.match(pattern, "DBMS_NAME: string(8) \"DB2/NT64\" ")
print(result)
result=re.match("DBMS_NAME python4: string\(\d+\) .*?", "DBMS_NAME python4: string(8) \"DB2/NT64\" ")
print(result)
expected output:
DBMS_NAME: string(%d) %s
DBMS_NAME: string(%d) %s
DBMS_NAME: string\(%d) %s
DBMS_NAME: string\(%d\) %s
DBMS_NAME: string\(%d\) .*?
DBMS_NAME: string\(\d+\) .*?
<re.Match object; span=(0, 21), match='DBMS_NAME: string(8) '>
<re.Match object; span=(0, 29), match='DBMS_NAME python4: string(8) '>
However, the below statement execution fails with python 3.7:
pattern = re.sub('%d', '\\d+', pattern)
DBMS_NAME: string(%d) %s
DBMS_NAME: string(%d) %s
DBMS_NAME: string\(%d) %s
DBMS_NAME: string\(%d\) %s
DBMS_NAME: string\(%d\) .*?
Traceback (most recent call last):
File "c:\users\skauser\appdata\local\programs\python\python37\lib\sre_parse.py", line 1021, in parse_template
this = chr(ESCAPES[this][1])
KeyError: '\\d'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pattern.txt", line 11, in <module>
pattern = re.sub('%d', '\\d+', pattern)
File "c:\users\skauser\appdata\local\programs\python\python37\lib\re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "c:\users\skauser\appdata\local\programs\python\python37\lib\re.py", line 309, in _subx
template = _compile_repl(template, pattern)
File "c:\users\skauser\appdata\local\programs\python\python37\lib\re.py", line 300, in _compile_repl
return sre_parse.parse_template(repl, pattern)
File "c:\users\skauser\appdata\local\programs\python\python37\lib\sre_parse.py", line 1024, in parse_template
raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \d at position 0
if I change the statement to have 3 backslash like
pattern = re.sub('%d', '\\\d+', pattern)
I can correctly generate correct regular expression.
Can you please comment if this has changed in python 3.7 and we need to escape 'd' in '\d' as well ?
Thank you!
----------
components: Regular Expressions
messages: 322842
nosy: ezio.melotti, mrabarnett, sabakauser
priority: normal
severity: normal
status: open
title: clarification on escaping \d in regular expressions
type: behavior
versions: Python 3.7
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34304>
_______________________________________
More information about the New-bugs-announce
mailing list