BeautifulSoup

Paul McGuire ptmcg at austin.rr.com
Fri Aug 19 14:39:27 EDT 2005


Here's a pyparsing program that reads my personal web page, and spits
out HTML with all of the HREF's reversed.

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net.)



from pyparsing import Literal, quotedString
import urllib

LT = Literal("<")
GT = Literal(">")
EQUALS = Literal("=")
htmlAnchor = LT + "A" + "HREF" + EQUALS +
quotedString.setResultsName("href") + GT

def convertHREF(s,l,toks):
    # do HREF conversion here - for demonstration, we will just reverse
them
    print toks.href
    return "<A HREF=%s>" % toks.href[::-1]

htmlAnchor.setParseAction( convertHREF )

inputURL = "http://www.geocities.com/ptmcg"
inputPage = urllib.urlopen(inputURL)
inputHTML = inputPage.read()
inputPage.close()

print htmlAnchor.transformString( inputHTML )




More information about the Python-list mailing list