RE Question
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Mon Aug 3 00:13:17 EDT 2009
En Sun, 02 Aug 2009 18:22:20 -0300, Victor Subervi
<victorsubervi at gmail.com> escribió:
> How do I search and replace something like this:
> aLine = re.sub('[<]?[p]?[>]?<font size="h' + str(x) + '"[
> a-zA-Z0-9"\'=:]*>[<]?[b]?[>]?', '<h' + str(x) + '>', aLine)
> where RE *only* looks for the possibility of "<p>" at the beginning of
> the
> string; that is, not the individual components as I have it coded above,
> but
> the entire 3-character block?
An example would make it more clear; I think you want to match either
"<p><font size=...." or "<font size=....". In other words, "<p>" is
optional. Use a normal group or a non-capturing group:
r'(<p>)?<font size="...'
r'(?:<p>)?<font size="...'
That said, using regular expressions to parse HTML or XML is terribly
fragile; I'd use a specific tool (like BeautifulSoup, ElementTree, or lxml)
--
Gabriel Genellina
More information about the Python-list
mailing list