How can I exclude a word by using re?

Paul McGuire ptmcg at
Tue Aug 16 15:18:28 CEST 2005

Just as with re you were using "?P<xxx>" to assign the matching text to
the variable "xxx", pyparsing allows you to associate a name with an
element of your grammar using setResultsName.

Here is your original re:
 ur'<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank>'

Here is the pyparsing expression:
valign + number.setResultsName("number"­) + tdEnd + \
            tdStart + SkipTo(aStart) + aStart + \
            SkipTo(tdEnd) + tdEnd

Here are the re and pyparsing pieces side by side:
re => pyparsing
valign=top>    =>  valign = CaselessLiteral("valign=top>")
(?P­­<number>\d{1,2})    =>    number = Word(nums),
</td>       =>    tdEnd
<td[^>]*­>­    =>   tdStart
\s{0,2}       =>  I don't know what this re does, so I just used
<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank>     =>  aStart (which
returns a value whose named attributes correspond to the HTML
attributes, such as href)
(?P<name>.+)   =>   SkipTo(tdEnd)  *** here is where we'll make our
change ***
</td>    =>  tdEnd

To capture the body of the second <td></td> tag pair, we'll add
setResultsName("name") to the pyparsing expression:
mp3entry = valign + number.setResultsName("number"­) + tdEnd + \
            tdStart + SkipTo(aStart) + aStart + \
            SkipTo(tdEnd)setResultsName("name") + tdEnd

Now you should be able to extract the data using:
for toks,s,e in mp3Entry.scanString(targetHTML­):
    print toks.number, toks.starta.href,

Good luck!
-- Paul

More information about the Python-list mailing list