[XML-SIG] Extracting info from XHTML with Xpath

Thu Mar 25 00:49:55 EST 2004

Thomas B. Passin wrote:
> Some xpath processors let you bind a namespace, but I am not sure about 
> the PyXML one.

Yes, it's quite possible.

To expand on Luis's example, I added the 'html' prefix to the XPath
expression, and created an explicit XPath context with the appropriate
binding, rather than letting a simple, default context be created by
Evaluate() (don't worry if you don't know what I mean by that) (and
I haven't tested this):

from xml.dom.ext.reader import PyExpat
from xml.path import Evaluate
from xml.xpath.Context import Context
from xml.dom.ext import PrettyPrint

path0 = '//html:h3[@class="coursetitle"]'

reader = PyExpat.Reader()
dom = reader.fromUri('http://www.hopkins.k12.mn.us/Pages/district/special/pq/timelytopics.html')

ctx = Context(dom.documentElement, processorNss={'html': 'http://www.w3.org/1999/xhtml'})
myElements = Evaluate(path0, context=ctx)
for element in myElements:
    PrettyPrint(element)

If you need the empty namespace (no namespace), use xml.dom.EMPTY_NAMESPACE.
If you need an empty prefix to assign the default namespace, use
xml.dom.EMPTY_PREFIX. Note however that changing the default namespace does
not affect how QNames are interpreted in XPath expressions.

You can also create variable bindings in the same way, with a dictionary named
varBindings. Make the keys be tuples consisting of (namespace, local-name) of
each variable.

The API in 4Suite is about the same, but with Ft.Xml.XPath instead of
xml.xpath, and you must supply a Domlette document, not a minidom document, in
the context.

-Mike