Backslash escape in regular expressions

Jonathan Giddy jon at bezek.dstc.monash.edu.au
Mon Dec 11 22:55:12 EST 2000


Peter Hansen <peter at engcorp.com> writes:

] Jonathan Giddy wrote:
] > 
] > According to the re module documentation, backslash either escapes special
] > characters, or signals a special sequence.  The special sequences are
] > then listed.
] > 
] > However, as this code shows, there are some special sequences (mainly
] > the whitespace characters) that are special, but aren't listed.  Is this
] > a lapse in the re implementation or the re documentation?  Can I safely
] > expect re.compile(r'\(hello\)\n') to always match '(hello)\n' (the current
] > behaviour) and not match '(hello)n' (the documented behaviour?)
] 
] The documentation I have clearly shows that \\ is a special sequence
] which turns into the backslash character itself.  

I agree that the documentation (Section 4.2.1 of the Library Reference)
states this.  But you're paying too much attention to the code <0.5 wink>.
\\ is irrelevant to the problem, as apart from the code sample, \\ 
doesn't appear in the problem description.

] With the raw-string
] form with 'r' your "current behaviour" above *is* the documented
] behaviour, isn't it?  At least, just using those strings with "print"
] shows that you don't get "(hello)n"...

Consider re.compile(r'\y').  \y is clearly never a special sequence.  Since 
this is a raw string, the re module gets a string with the two characters
'\' and 'y'.  By my reading of the re module documentation, \y should match 
a plain y, which it does.

Now consider re.compile(r'\n').  \n is normally a newline, but inside a 
raw string, it is actually the two characters '\' and 'n'.  By my reading
of the re module documentation, \n is not a "special character escape" (\*, 
\?, and so forth), since 'n' is not a special character in a regex.  In
addition, it is not a "special sequence", since it does not consist of
'\' and a character from the list in the documentation.  Therefore, the
documentation indicates that, like \y, \n should match a plain n.

So, the regex in the example should match the Python string '(hello)n',
but instead matches the Python string '(hello)\n'.  I prefer the latter,
but think the documentation should indicate that \a, \f, \n, \t, \v, and
\x are also "special sequences".





More information about the Python-list mailing list