getiterator vs xpath question

I am misunderstanding the difference between these two code blocks. I thought they would have the same result. I want to find every element in the tree that has an 'id' or a 'name' attribute so I can store that attribute. (I'm link-checking a static html site). starting out:
Then using getiterator(),
And I *thought* the xpath would return the same thing.
So how do I go through the tree and get each id or name attribute value? thanks, --Tim Arnold

Tim Arnold, 01.08.2012 00:48:
Have a look at the link iterator in lxml.html. It also handles references and links in CSS, for example.
I don't see the difference, either. Could you put the HTML document on a web server somewhere so that others can try to reproduce it? Which of the two results is the one you expected? Is this taken from a single Python session, without reparsing in between?
So how do I go through the tree and get each id or name attribute value?
I'd use this code: for elem in tree.iter('div'): if elem.get('class') == 'section': print elem.attrib Likely faster than any of the above. Stefan

Tim Arnold, 01.08.2012 00:48:
Have a look at the link iterator in lxml.html. It also handles references and links in CSS, for example.
I don't see the difference, either. Could you put the HTML document on a web server somewhere so that others can try to reproduce it? Which of the two results is the one you expected? Is this taken from a single Python session, without reparsing in between?
So how do I go through the tree and get each id or name attribute value?
I'd use this code: for elem in tree.iter('div'): if elem.get('class') == 'section': print elem.attrib Likely faster than any of the above. Stefan
participants (2)
-
Stefan Behnel
-
Tim Arnold