re troubles

Bengt Richter bokr at oz.net
Thu Dec 18 21:48:42 EST 2003


On Thu, 18 Dec 2003 17:22:54 -0600, Evanda Remington <evanda at remingtons.org> wrote:

>I'm trying to filter some rows of an html table out, based on their
>contents.  For input like:
>"""
><table>
>  <tr>
>    <td>Lasers</td><td>17</td> </tr>
>  <tr>                                            <<  want to filter
>    <td>kittens</td><td>8</td>                    <<  this out.
>  </tr>                                           <<
>  <tr> <td>robots</td><td>8</td> </tr>
></table>
>"""
>I would like to completely remove the (3 line) table row that makes mention
>of kittens.  The regexp I have tried to use is: r"<tr>.*?kittens.*?</tr>".
>When compiled and used with subs("",data), strangely removes everything
>from the first "<tr>" to the first "<tr>" after kittens.
>
>That is, the ".*?" notation works in the second half, but not in the first
>half.  It behaves the same as ".*" should.
>
>Any advice?
>
See if this will work for you. I added some more kittens and robots. Otherwise
a single instance could be done differently. I used 'XXX' rather than '' for example clarity.

====< evanda.py >====================
import re
s = """\
<table>
  <tr>
    <td>Lasers</td><td>17</td> </tr>
  <tr>                                            <<  want to filter
    <td>kittens</td><td>8</td>                    <<  this out.
  </tr>                                           <<
  <tr> <td>robots</td><td>8</td> </tr>
  <tr>                                            <<  want to filter
    <td>more kittens</td><td>8</td>               <<  this out.
  </tr>                                           <<
  <tr> <td>more robots</td><td>8</td> </tr>
</table>
"""
rxo = re.compile(r"(?ms)<tr>(?:[^<]|<[^t]|<t[^r]|<tr[^>])*?kittens.*?</tr>")
print '==== before ====\n%s==== after sub XXX ====\n%s====' % (s, rxo.sub('XXX', s))
=====================================
Result:

[19:02] C:\pywk\clp>evanda.py
==== before ====
<table>
  <tr>
    <td>Lasers</td><td>17</td> </tr>
  <tr>                                            <<  want to filter
    <td>kittens</td><td>8</td>                    <<  this out.
  </tr>                                           <<
  <tr> <td>robots</td><td>8</td> </tr>
  <tr>                                            <<  want to filter
    <td>more kittens</td><td>8</td>               <<  this out.
  </tr>                                           <<
  <tr> <td>more robots</td><td>8</td> </tr>
</table>
==== after sub XXX ====
<table>
  <tr>
    <td>Lasers</td><td>17</td> </tr>
  XXX                                           <<
  <tr> <td>robots</td><td>8</td> </tr>
  XXX                                           <<
  <tr> <td>more robots</td><td>8</td> </tr>
</table>
====

Regards,
Bengt Richter




More information about the Python-list mailing list