Backslash escape in regular expressions

Jonathan Giddy jon at bezek.dstc.monash.edu.au
Mon Dec 11 05:18:37 CET 2000


Hi,

According to the re module documentation, backslash either escapes special
characters, or signals a special sequence.  The special sequences are
then listed.

However, as this code shows, there are some special sequences (mainly
the whitespace characters) that are special, but aren't listed.  Is this 
a lapse in the re implementation or the re documentation?  Can I safely
expect re.compile(r'\(hello\)\n') to always match '(hello)\n' (the current
behaviour) and not match '(hello)n' (the documented behaviour?)

Thanks,
	Jon.


import re, traceback

for c in 'abcdefghijklmnopqrstuvwxyz':
    print c, '-',
    try:
	if c in 'AbBdDsSwWZ':
	    print 'documented escape'
	else:
	    r = re.compile('\\' + c)
	    if r.match(c):
		print 'documented ordinary'
	    else:
		escape = eval('"\\%s"' % c)
		if r.match(escape):
		    print 'undocumented escape "\\%s"' % c
		else:
		    print '??'
    except:
	print
	traceback.print_exc()



More information about the Python-list mailing list