[XML-SIG] Extracting info from XHTML with Xpath

Tim Wilson wilson at visi.com
Thu Mar 25 01:20:21 EST 2004

I've got a ton to learn about XML processing, but I was able to piece the
following together using libxml2 and Simon Willison's information at


import libxml2
import urllib2

url = 

dom = libxml2.parseDoc(urllib2.urlopen(url).read())
ctxt = dom.xpathNewContext()
ctxt.xpathRegisterNs('xhtml', 'http://www.w3.org/1999/xhtml')

titles = [t.content for t in
newtitles = []
for title in titles:
    newtitles.append(' '.join([word.strip() for word in title.split()]))
for title in newtitles:
    print title

I couldn't find any way to remove extraneous whitespace from the tag
contents without all the splitting, stripping, and joining.


Tim Wilson
Twin Cities, Minnesota, USA
Educational technology guy, Linux and OS X fan, Grad. student, Daddy
mailto: wilson at visi.com   aim: tis270   public key: 0x8C0F8813

More information about the XML-SIG mailing list