[Tutor] regular expressions - backslashes

Kent Johnson kent37 at tds.net
Thu Apr 6 11:54:41 CEST 2006


Justin Ezequiel wrote:
> a colleague demonstrated a problem he had with regular expressions
> 
>>>> apppath = os.path.abspath(os.curdir)
>>>> apppath
> 'C:\\napp_and_author_query\\napp_and_author_query\\a b c d'
>>>> template = r'<AppPath>\files'
>>>> template
> '<AppPath>\\files'
>>>> re.sub(r'(?i)<apppath>', apppath, template)
> 'C:\napp_and_author_query\napp_and_author_query\x07 b c d\\files'
>>>> print re.sub(r'(?i)<apppath>', apppath, template)
> C:
> app_and_author_query
> app_and_author_query b c d\files

The problem is that the re engine itself is interpreting the backslashes 
in the replacement pattern. Here is a simpler example:
In [34]: import re

In [35]: text = 'abcabc'

With a single slash you get a newline even though the slash is a literal 
in the replacement string:
In [36]: re.sub('a', r'\n', text)
Out[36]: '\nbc\nbc'

So if you want a literal \ in your replacement text you have to escape 
the \, even in a raw string:

In [37]: re.sub('a', r'\\n', text)
Out[37]: '\\nbc\\nbc'

If it isn't a raw string, you need four \!

In [39]: re.sub('a', '\\\\n', text)
Out[39]: '\\nbc\\nbc'

You can use re.escape() to introduce the needed slashes:

In [38]: re.sub('a', re.escape(r'\n'), text)
Out[38]: '\\nbc\\nbc'

Kent



More information about the Tutor mailing list