RE Question

Victor Subervi victorsubervi at gmail.com
Tue Aug 4 00:05:13 CEST 2009


That worked. Thank you again :)
Victor

On Mon, Aug 3, 2009 at 12:13 AM, Gabriel Genellina
<gagsl-py2 at yahoo.com.ar>wrote:

> En Sun, 02 Aug 2009 18:22:20 -0300, Victor Subervi <
> victorsubervi at gmail.com> escribió:
>
>
>  How do I search and replace something like this:
>> aLine = re.sub('[<]?[p]?[>]?<font size="h' + str(x) + '"[
>> a-zA-Z0-9"\'=:]*>[<]?[b]?[>]?', '<h' + str(x) + '>', aLine)
>> where RE *only* looks for the possibility of "<p>" at the beginning of the
>> string; that is, not the individual components as I have it coded above,
>> but
>> the entire 3-character block?
>>
>
> An example would make it more clear; I think you want to match either
> "<p><font size=...." or "<font size=....". In other words, "<p>" is
> optional. Use a normal group or a non-capturing group:
> r'(<p>)?<font size="...'
> r'(?:<p>)?<font size="...'
>
> That said, using regular expressions to parse HTML or XML is terribly
> fragile; I'd use a specific tool (like BeautifulSoup, ElementTree, or lxml)
>
> --
> Gabriel Genellina
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090803/e6a3163c/attachment.html>


More information about the Python-list mailing list