Regular Expression help for parsing html tables

Odalrick odalrick at hotmail.com
Sun Oct 29 09:30:24 CET 2006


steve551979 at hotmail.com skrev:

> Hello,
>
> I am having some difficulty creating a regular expression for the
> following string situation in html. I want to find a table that has
> specific text in it and then extract the html just for that immediate
> table.
>
> the string would look something like this:
>
> ...stuff here...
> <table>
> ...stuff here...
> <table>
> ...stuff here...
> <table>
> ...
> text i'm searching for
> ...
> </table>
> ...stuff here...
> </table>
> ...stuff here...
> </table>
> ...stuff here...
>
>
> My question:  is there a way in RE to say:   "when I find this text I'm
> looking for, search backwards and find the immediate instance of the
> string "<table>"  and then search forwards and find the immediate
> instance of the string "</table>".  " ?
>
> any help is appreciated.
>
> Steve.

It would have been easier if you'd said what the text you are looking
for is, but I think:

regex = re.compile( r'<table>(.*?text you are looking for.*?)</table>',
re.DOTALL )
match = regex.search( html_string )
found_table = match.group( 1 )

would work.

/Odalrick




More information about the Python-list mailing list