[XML-SIG] Extracting info from XHTML with Xpath
Thomas B. Passin
tpassin at comcast.net
Wed Mar 24 23:53:32 EST 2004
Tim Wilson wrote:
> On 3/24/04 7:11 PM, "Thomas B. Passin" <tpassin at comcast.net> wrote:
>
>
>>The question is - are you asking for help with getting Python to apply
>>an xpath expression, or are you asking for help in writing the correct
>>xpath expressions?
>
>
> Hey Tom,
>
> I understand how to use the Xpath expression once it's created. I'm just
> having trouble finding the right expression. I wrote a little script that
> used Xpath on an RSS feed a year or so ago, but this XHTML file has me
> puzzled.
>
It is simple in xslt. It would be an expression like this -
//html:h3[@class='coursetitle']
The problem is that you are using a default namespace, and there is no
standard way to tell XPath to use it. There is no prefix bound to the
xhtml namespace. In xslt, you can bind a prefix in the stylesheet, and
that is where the "html:" would come from.
Some xpath processors let you bind a namespace, but I am not sure about
the PyXML one.
One approach would be to run the source through an xslt stylesheet that
is an identity transform except that it adds an explicit namespace
prefix. Then an xpath expression like the one above would work.
Cheers,
Tom P
More information about the XML-SIG
mailing list