Mailman 3 xpath example - lxml - The Python XML Toolkit - python.org

newer
Re: [lxml] Parsing HTML files with...

xpath example

older
Re: [lxml] Parsing HTML files with...

Mag Gam

April 21, 2011

8:26 a.m.

I am in search for a good xpath example. Let say I have, this xml file <?xml version="1.0"?> <catalog> <book id="bk101"> <genre><s>Computer</s></genre> <price><f>44.95</f></price> <publish_date><d>2000-10-01</d></publish_date> <description><s>An in-depth look at creating applications with XML.</s></description> </book> </catalog> How can I extract only the price of a book which has a genre of "computers"? How can I extract, price, description of book id"bk101" ?

Reply

Sign in to reply online Use email software

Show replies by date

Brad Smith

April 2011

9 a.m.

Note that I have not tested any of these, but they should at least be close.

# //book find all book tags # /genre ...that contain genre subtags # /s ...that contain s subtags # = "computers" ...that contain the text "computers" books = rootElem.xpath('//book/genre/s = "computers"')

How can I extract, price, description of book id"bk101" ?

For this, I would use separate queries to get the book nodes and the subnodes, though there may be a better way using xpath's "or" operator. It might also be faster to use lxml to get the subnodes you want than to use the second and third xpath calls: # Similar to the previous query with one addition # [@id=...] where the id attribute is... books = rootElem.xpath(//book[@id="bk101"]) for book in books: print book.xpath(./price/f)[0].text print book.xpath(./description/s)[0].text There is a great xpath tutorial and reference here. It should give you all the information you need: http://www.w3schools.com/xpath/default.asp Good luck, --Brad

Reply

Sign in to reply online Use email software

Terry Brown

9:02 a.m.

On Thu, 21 Apr 2011 08:26:24 -0400 Mag Gam <magawake@gmail.com> wrote:

http://www.w3.org/TR/xpath/ http://www.w3schools.com/xpath/default.asp from lxml import etree dom = etree.parse('d.xml') p = dom.xpath('//book[genre/s="Computer"]/price/f/text()')[0] bk = dom.xpath('//book[@id="bk101"]')[0] p2 = bk.xpath('.//price/f/text()')[0] d = bk.xpath('.//description/s/text()')[0] print p, p2, d It might be better to do something like d = ' '.join(bk.xpath('.//description//text()')) to collect all the text under the description node, depends whether there's ever more than one <s/> etc. Cheers -Terry

Reply

Sign in to reply online Use email software

Brad Smith

April 2011

9 a.m.

Note that I have not tested any of these, but they should at least be close.

# //book find all book tags # /genre ...that contain genre subtags # /s ...that contain s subtags # = "computers" ...that contain the text "computers" books = rootElem.xpath('//book/genre/s = "computers"')

How can I extract, price, description of book id"bk101" ?

For this, I would use separate queries to get the book nodes and the subnodes, though there may be a better way using xpath's "or" operator. It might also be faster to use lxml to get the subnodes you want than to use the second and third xpath calls: # Similar to the previous query with one addition # [@id=...] where the id attribute is... books = rootElem.xpath(//book[@id="bk101"]) for book in books: print book.xpath(./price/f)[0].text print book.xpath(./description/s)[0].text There is a great xpath tutorial and reference here. It should give you all the information you need: http://www.w3schools.com/xpath/default.asp Good luck, --Brad

Reply

Sign in to reply online Use email software

Terry Brown

9:02 a.m.

On Thu, 21 Apr 2011 08:26:24 -0400 Mag Gam <magawake@gmail.com> wrote:

http://www.w3.org/TR/xpath/ http://www.w3schools.com/xpath/default.asp from lxml import etree dom = etree.parse('d.xml') p = dom.xpath('//book[genre/s="Computer"]/price/f/text()')[0] bk = dom.xpath('//book[@id="bk101"]')[0] p2 = bk.xpath('.//price/f/text()')[0] d = bk.xpath('.//description/s/text()')[0] print p, p2, d It might be better to do something like d = ' '.join(bk.xpath('.//description//text()')) to collect all the text under the description node, depends whether there's ever more than one <s/> etc. Cheers -Terry

Reply

Sign in to reply online Use email software

5057

Age (days ago)

5057

Last active (days ago)

Download

2 comments

3 participants

tags

participants (3)

Brad Smith
Mag Gam
Terry Brown