BeautifulSoup
Paul McGuire
ptmcg at austin.rr.com
Fri Aug 19 14:39:27 EDT 2005
Here's a pyparsing program that reads my personal web page, and spits
out HTML with all of the HREF's reversed.
-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net.)
from pyparsing import Literal, quotedString
import urllib
LT = Literal("<")
GT = Literal(">")
EQUALS = Literal("=")
htmlAnchor = LT + "A" + "HREF" + EQUALS +
quotedString.setResultsName("href") + GT
def convertHREF(s,l,toks):
# do HREF conversion here - for demonstration, we will just reverse
them
print toks.href
return "<A HREF=%s>" % toks.href[::-1]
htmlAnchor.setParseAction( convertHREF )
inputURL = "http://www.geocities.com/ptmcg"
inputPage = urllib.urlopen(inputURL)
inputHTML = inputPage.read()
inputPage.close()
print htmlAnchor.transformString( inputHTML )
More information about the Python-list
mailing list