re troubles
Bengt Richter
bokr at oz.net
Thu Dec 18 21:48:42 EST 2003
On Thu, 18 Dec 2003 17:22:54 -0600, Evanda Remington <evanda at remingtons.org> wrote:
>I'm trying to filter some rows of an html table out, based on their
>contents. For input like:
>"""
><table>
> <tr>
> <td>Lasers</td><td>17</td> </tr>
> <tr> << want to filter
> <td>kittens</td><td>8</td> << this out.
> </tr> <<
> <tr> <td>robots</td><td>8</td> </tr>
></table>
>"""
>I would like to completely remove the (3 line) table row that makes mention
>of kittens. The regexp I have tried to use is: r"<tr>.*?kittens.*?</tr>".
>When compiled and used with subs("",data), strangely removes everything
>from the first "<tr>" to the first "<tr>" after kittens.
>
>That is, the ".*?" notation works in the second half, but not in the first
>half. It behaves the same as ".*" should.
>
>Any advice?
>
See if this will work for you. I added some more kittens and robots. Otherwise
a single instance could be done differently. I used 'XXX' rather than '' for example clarity.
====< evanda.py >====================
import re
s = """\
<table>
<tr>
<td>Lasers</td><td>17</td> </tr>
<tr> << want to filter
<td>kittens</td><td>8</td> << this out.
</tr> <<
<tr> <td>robots</td><td>8</td> </tr>
<tr> << want to filter
<td>more kittens</td><td>8</td> << this out.
</tr> <<
<tr> <td>more robots</td><td>8</td> </tr>
</table>
"""
rxo = re.compile(r"(?ms)<tr>(?:[^<]|<[^t]|<t[^r]|<tr[^>])*?kittens.*?</tr>")
print '==== before ====\n%s==== after sub XXX ====\n%s====' % (s, rxo.sub('XXX', s))
=====================================
Result:
[19:02] C:\pywk\clp>evanda.py
==== before ====
<table>
<tr>
<td>Lasers</td><td>17</td> </tr>
<tr> << want to filter
<td>kittens</td><td>8</td> << this out.
</tr> <<
<tr> <td>robots</td><td>8</td> </tr>
<tr> << want to filter
<td>more kittens</td><td>8</td> << this out.
</tr> <<
<tr> <td>more robots</td><td>8</td> </tr>
</table>
==== after sub XXX ====
<table>
<tr>
<td>Lasers</td><td>17</td> </tr>
XXX <<
<tr> <td>robots</td><td>8</td> </tr>
XXX <<
<tr> <td>more robots</td><td>8</td> </tr>
</table>
====
Regards,
Bengt Richter
More information about the Python-list
mailing list