[XML-SIG] Extracting info from XHTML with Xpath

Thomas B. Passin tpassin at comcast.net
Wed Mar 24 23:53:32 EST 2004


Tim Wilson wrote:

> On 3/24/04 7:11 PM, "Thomas B. Passin" <tpassin at comcast.net> wrote:
> 
> 
>>The question is - are you asking for help with getting Python to apply
>>an xpath expression, or are you asking for help in writing the correct
>>xpath expressions?
> 
> 
> Hey Tom,
> 
> I understand how to use the Xpath expression once it's created. I'm just
> having trouble finding the right expression. I wrote a little script that
> used Xpath on an RSS feed a year or so ago, but this XHTML file has me
> puzzled.
> 


It is simple in xslt.  It would be an expression like this -

//html:h3[@class='coursetitle']

The problem is that you are using a default namespace, and there is no 
standard way to tell XPath to use it.  There is no prefix bound to the 
xhtml namespace.  In xslt, you can bind a prefix in the stylesheet, and 
that is where the "html:" would come from.

Some xpath processors let you bind a namespace, but I am not sure about 
the PyXML one.

One approach would be to run the source through an xslt stylesheet that 
is an identity transform except that it adds an explicit namespace 
prefix.  Then an xpath expression like the one above would work.

Cheers,

Tom P




More information about the XML-SIG mailing list