Mailman 3 getiterator vs xpath question - lxml - The Python XML Toolkit

31 Jul 2012

      I am misunderstanding the difference between these two code blocks. I thought they would have the same result.
I want to find every element in the tree that has an 'id' or a 'name' attribute so I can store that attribute.
(I'm link-checking a static html site).

starting out:
...
...
...
from lxml import etree
parser = etree.HTMLParser()
tree = etree.parse('ugdet17.htm', parser=parser)
Then using getiterator(),
...
...
...
for elem in tree.getiterator():
...     if elem.tag == 'div':
...         if elem.get('class') == 'section':
...             print elem.attrib
...
{'class': 'section', 'id': 'ugseldet'}
And I *thought* the xpath would return the same thing.
...
...
...
for div in tree.xpath('//div[@class="section"]'):
...     print div.attrib
...
{'class': 'section', 'id': 'ugseldet'}
{'class': 'section', 'id': 'ugstep'}
So how do I go through the tree and get each id or name attribute value?

thanks,
--Tim Arnold

getiterator vs xpath question

Tim Arnold

Stefan Behnel

tags

participants (2)