Backslash escape in regular expressions
Jonathan Giddy
jon at bezek.dstc.monash.edu.au
Mon Dec 11 22:55:12 EST 2000
Peter Hansen <peter at engcorp.com> writes:
] Jonathan Giddy wrote:
] >
] > According to the re module documentation, backslash either escapes special
] > characters, or signals a special sequence. The special sequences are
] > then listed.
] >
] > However, as this code shows, there are some special sequences (mainly
] > the whitespace characters) that are special, but aren't listed. Is this
] > a lapse in the re implementation or the re documentation? Can I safely
] > expect re.compile(r'\(hello\)\n') to always match '(hello)\n' (the current
] > behaviour) and not match '(hello)n' (the documented behaviour?)
]
] The documentation I have clearly shows that \\ is a special sequence
] which turns into the backslash character itself.
I agree that the documentation (Section 4.2.1 of the Library Reference)
states this. But you're paying too much attention to the code <0.5 wink>.
\\ is irrelevant to the problem, as apart from the code sample, \\
doesn't appear in the problem description.
] With the raw-string
] form with 'r' your "current behaviour" above *is* the documented
] behaviour, isn't it? At least, just using those strings with "print"
] shows that you don't get "(hello)n"...
Consider re.compile(r'\y'). \y is clearly never a special sequence. Since
this is a raw string, the re module gets a string with the two characters
'\' and 'y'. By my reading of the re module documentation, \y should match
a plain y, which it does.
Now consider re.compile(r'\n'). \n is normally a newline, but inside a
raw string, it is actually the two characters '\' and 'n'. By my reading
of the re module documentation, \n is not a "special character escape" (\*,
\?, and so forth), since 'n' is not a special character in a regex. In
addition, it is not a "special sequence", since it does not consist of
'\' and a character from the list in the documentation. Therefore, the
documentation indicates that, like \y, \n should match a plain n.
So, the regex in the example should match the Python string '(hello)n',
but instead matches the Python string '(hello)\n'. I prefer the latter,
but think the documentation should indicate that \a, \f, \n, \t, \v, and
\x are also "special sequences".
More information about the Python-list
mailing list