RE multiline
Diez B. Roggisch
deets at nospam.web.de
Sun Nov 30 15:33:41 EST 2008
Guy Doune schrieb:
> Hi,
>
> I try to figure out what gonna be the equivalent of :
>
> (.*?)
>
> For the same purpose on multiline basis.
>
> I would like completed the variable part of elements that I searching for.
>
> Example :
>
> <table width="95%" cellpadding="0" cellspacing="0" border="0"
> align="center">
>
> Is the begining of the variable element that I wanna completed...
>
> </table>
>
> Is the end of the element, so, I would like to completed what between
> those two patterns.
>
>
> pattern1+r"(.*?)"+pattern2
>
> Was working ok for a single line selection like :
>
> <table width="95%" cellpadding="0" cellspacing="0" border="0"
> align="center">"variable element of the search"</table>
>
> I hoped that I have been clear.
See the flags of module re - especially re.DOTALL.
However, you just experience that regular expresions aren't the proper
tool for the job of dealing with HTML/XML.
What would you do if for example a table was nested inside another?
Instead, use tools like BeautifulSoup or lxml which provide
error-tolerant HTML-parsers and expression/filter-based element
extraction. That's much better suited for your task.
Diez
More information about the Python-list
mailing list