[XML-SIG] Extracting info from XHTML with Xpath
luis miguel morillas
morillas at posta.unizar.es
Wed Mar 24 17:58:09 EST 2004
Asunto: [XML-SIG] Extracting info from XHTML with Xpath
Fecha: mié, mar 24, 2004 at 03:58:07 -0600
Citando a Tim Wilson (wilson at visi.com):
> Hi everyone,
>
> I'm going to be teaching a course on building Web pages with Web standards
> and I thought it would be fun to show a little demo of a python script that
> could extract information from an XHTML document. I found Simon Willison's
> description of using Xpath and Python, but I haven't had any luck getting an
> Xpath expression that works.
>
> I've got a Web page at
>
> http://www.hopkins.k12.mn.us/Pages/district/special/pq/timelytopics.html
>
> that lists a bunch of upcoming tech classes in our school district. I'd like
> to extract the coursetitles and dates.
>
> Would anyone be willing to have a quick look at the source for that page and
> suggest a way to address the <h3 class="coursetitle"> and <p class="date">
> information?
>
Perhaps
from xml.dom.ext.reader import PyExpat
from xml.path import Evaluate
from xml.dom.ext import PrettyPrint
path0 = '//h3[@class="coursetitle"]'
reader = PyExpat.Reader()
dom = reader.fromUri('http://www.hopkins.k12.mn.us/Pages/district/special/pq/timelytopics.html')
myElements = Evaluate(path0, dom.documentElement)
for element in myElements:
PrettyPrint(element)
-- lm
More information about the XML-SIG
mailing list