[Tutor] regular expressions - backslashes
Kent Johnson
kent37 at tds.net
Thu Apr 6 11:54:41 CEST 2006
Justin Ezequiel wrote:
> a colleague demonstrated a problem he had with regular expressions
>
>>>> apppath = os.path.abspath(os.curdir)
>>>> apppath
> 'C:\\napp_and_author_query\\napp_and_author_query\\a b c d'
>>>> template = r'<AppPath>\files'
>>>> template
> '<AppPath>\\files'
>>>> re.sub(r'(?i)<apppath>', apppath, template)
> 'C:\napp_and_author_query\napp_and_author_query\x07 b c d\\files'
>>>> print re.sub(r'(?i)<apppath>', apppath, template)
> C:
> app_and_author_query
> app_and_author_query b c d\files
The problem is that the re engine itself is interpreting the backslashes
in the replacement pattern. Here is a simpler example:
In [34]: import re
In [35]: text = 'abcabc'
With a single slash you get a newline even though the slash is a literal
in the replacement string:
In [36]: re.sub('a', r'\n', text)
Out[36]: '\nbc\nbc'
So if you want a literal \ in your replacement text you have to escape
the \, even in a raw string:
In [37]: re.sub('a', r'\\n', text)
Out[37]: '\\nbc\\nbc'
If it isn't a raw string, you need four \!
In [39]: re.sub('a', '\\\\n', text)
Out[39]: '\\nbc\\nbc'
You can use re.escape() to introduce the needed slashes:
In [38]: re.sub('a', re.escape(r'\n'), text)
Out[38]: '\\nbc\\nbc'
Kent
More information about the Tutor
mailing list