Questions about regex
Rob Williscroft
rtw at freenet.co.uk
Sat May 30 10:45:34 EDT 2009
wrote in news:fe9f707f-aaf3-4ca6-859a-5b0c63904fc0
@s28g2000vbp.googlegroups.com in comp.lang.python:
> text = re.sub('(\<(/?[^\>]+)\>)', "", text)#remove the HTML
>
Python has a /r/ (raw) string literal type for regex's:
text = re.sub( r'(\<(/?[^\>]+)\>)', "", text )
In raw strings python doesn't process backslash escape sequences
so r\n' is the 2 char' string '\\n' (a backslash folowed by an 'n').
Without that your pattern string would need to be writen as:
'(\\<(/?[^\\>]+)\\>)'
IOW backslashes need to be doubled up or python will process them
before they are passed to re.sub.
Also this seems to be some non-python dialect of regular expression
language, Pythons re's don't need to escape < and >.
http://docs.python.org/library/re.html
The grouping operators, '(' and ')', appear to be unnessasery,
so altogether this 1 line should probably be:
text = re.sub( r'</?[^>]+>', '', text )
Rob.
--
http://www.victim-prime.dsl.pipex.com/
More information about the Python-list
mailing list