
I am in search for a good xpath example. Let say I have, this xml file <?xml version="1.0"?> <catalog> <book id="bk101"> <genre><s>Computer</s></genre> <price><f>44.95</f></price> <publish_date><d>2000-10-01</d></publish_date> <description><s>An in-depth look at creating applications with XML.</s></description> </book> </catalog> How can I extract only the price of a book which has a genre of "computers"? How can I extract, price, description of book id"bk101" ?

Note that I have not tested any of these, but they should at least be close.
# //book find all book tags # /genre ...that contain genre subtags # /s ...that contain s subtags # = "computers" ...that contain the text "computers" books = rootElem.xpath('//book/genre/s = "computers"')
How can I extract, price, description of book id"bk101" ?
For this, I would use separate queries to get the book nodes and the subnodes, though there may be a better way using xpath's "or" operator. It might also be faster to use lxml to get the subnodes you want than to use the second and third xpath calls: # Similar to the previous query with one addition # [@id=...] where the id attribute is... books = rootElem.xpath(//book[@id="bk101"]) for book in books: print book.xpath(./price/f)[0].text print book.xpath(./description/s)[0].text There is a great xpath tutorial and reference here. It should give you all the information you need: http://www.w3schools.com/xpath/default.asp Good luck, --Brad

On Thu, 21 Apr 2011 08:26:24 -0400 Mag Gam <magawake@gmail.com> wrote:
http://www.w3.org/TR/xpath/ http://www.w3schools.com/xpath/default.asp from lxml import etree dom = etree.parse('d.xml') p = dom.xpath('//book[genre/s="Computer"]/price/f/text()')[0] bk = dom.xpath('//book[@id="bk101"]')[0] p2 = bk.xpath('.//price/f/text()')[0] d = bk.xpath('.//description/s/text()')[0] print p, p2, d It might be better to do something like d = ' '.join(bk.xpath('.//description//text()')) to collect all the text under the description node, depends whether there's ever more than one <s/> etc. Cheers -Terry

Note that I have not tested any of these, but they should at least be close.
# //book find all book tags # /genre ...that contain genre subtags # /s ...that contain s subtags # = "computers" ...that contain the text "computers" books = rootElem.xpath('//book/genre/s = "computers"')
How can I extract, price, description of book id"bk101" ?
For this, I would use separate queries to get the book nodes and the subnodes, though there may be a better way using xpath's "or" operator. It might also be faster to use lxml to get the subnodes you want than to use the second and third xpath calls: # Similar to the previous query with one addition # [@id=...] where the id attribute is... books = rootElem.xpath(//book[@id="bk101"]) for book in books: print book.xpath(./price/f)[0].text print book.xpath(./description/s)[0].text There is a great xpath tutorial and reference here. It should give you all the information you need: http://www.w3schools.com/xpath/default.asp Good luck, --Brad

On Thu, 21 Apr 2011 08:26:24 -0400 Mag Gam <magawake@gmail.com> wrote:
http://www.w3.org/TR/xpath/ http://www.w3schools.com/xpath/default.asp from lxml import etree dom = etree.parse('d.xml') p = dom.xpath('//book[genre/s="Computer"]/price/f/text()')[0] bk = dom.xpath('//book[@id="bk101"]')[0] p2 = bk.xpath('.//price/f/text()')[0] d = bk.xpath('.//description/s/text()')[0] print p, p2, d It might be better to do something like d = ' '.join(bk.xpath('.//description//text()')) to collect all the text under the description node, depends whether there's ever more than one <s/> etc. Cheers -Terry
participants (3)
-
Brad Smith
-
Mag Gam
-
Terry Brown