Mailman 3 [lxml-dev] Some problem with an xpath - lxml - The Python XML Toolkit

15 Jun 2006

      Hi,

I'd like to extract a string from an html document without caring where
it is in the tree. However somehow my xpath expression returns all
nodes :-(

I tried to reproduce the problem with a mini script, but
...
...
...
xml=etree.XML('<a><b>Test</b><b>Test</b><b>Tf<e />Test</b><b>dgfd</b></a>')
xml.xpath('//b[fn:contains(self::text(),\'Test\')]')
[, , , ]
works as expected, however for my html page I get
...
...
...
html = etree.parse('/home/andreas/public_html/batman_dvd.html',etree.HTMLParser())
html.xpath('//font[fn:contains(self::text(),\'Minuten\')]/text()')
['\n                   ', '\n                   ', '\n                   ', '\n                    ', u' Die folgenden Daten wurden noch nicht redaktionell \xfcberpr\xfcft.\n                   ', '\n                     ', '\n                    ', '\n                       ', '\n                      ', 'Erscheinungsart:\n                      ', '\n                       ', '\n                      ', 'Label:\n                      ', '\n                       ', '\n                      ', u'V\xd6-Termin:\n
The page can be seen at http://www.ofdb.de/view.php?page=fassung&fid=1130&vid=148784

Is this a problem of lxml or my xpath expression? Even if I provide a
more apropriate "start path", i.e. select a table deep in the hierarchy
that contains the looked for element I get a lot of text nodes back.

Andreas

-- 
You definitely intend to start living sometime soon.

[lxml-dev] Some problem with an xpath

Andreas Pakulat

Piet van Oostrum

Andreas Pakulat

tags

participants (2)