problem with newlines in regexp substitution
James Stroud
jstroud at ucla.edu
Thu Feb 23 16:10:36 EST 2006
Florian Schulze wrote:
> See the following results:
>
> Python 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
>
>>>> import re
>>>> s = "1"
>>>> re.sub('1','\\n',s)
>
> '\n'
>
>>>> '\\n'
>
> '\\n'
>
>>>> re.sub('1',r'\\n',s)
>
> '\\n'
>
>>>> s.replace('1','\\n')
>
> '\\n'
>
>>>> repl = '\\n'
>>>> re.sub('1',repl,s)
>
> '\n'
>
>>>> s.replace('1',repl)
>
> '\\n'
>
> Why is the behaviour of the regexp substitution so weird and can I
> prevent that? It breaks my asumptions and thus my code.
>
> Regards,
> Florian Schulze
>
"Why" questions are always tough to answer. E.g.: Why are we here?
The answer to "what is happening" is much easier. Strings passed to the
regex engine are processed first, so escapes must be escaped. This is
why raw strings were invented. If it weren't for these, I'd still be
using perl. In raw strings, as you have noticed, a '\' is already
escaped. In the olden days, you'd have to type "\\\\" to mean a literal
backslash, so creating a literal backslash in a regex that produced a
string that would then itself be used in a regex would be
'\\\\\\\\\\\\\\\\', which scared me away from Python for a couple of
years (rmember, the final printed product would be '\').
That patently doesn't answer your question, but here is something to ponder:
py> s.replace('1',repl)[0]
'\\'
py> print s.replace('1',repl)
\n
James
More information about the Python-list
mailing list