getiterator vs xpath question
data:image/s3,"s3://crabby-images/2724d/2724d3080b8e83dd73bafb0e5591d4c8972d1e54" alt=""
I am misunderstanding the difference between these two code blocks. I thought they would have the same result. I want to find every element in the tree that has an 'id' or a 'name' attribute so I can store that attribute. (I'm link-checking a static html site). starting out:
Then using getiterator(),
And I *thought* the xpath would return the same thing.
So how do I go through the tree and get each id or name attribute value? thanks, --Tim Arnold
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Tim Arnold, 01.08.2012 00:48:
Have a look at the link iterator in lxml.html. It also handles references and links in CSS, for example.
I don't see the difference, either. Could you put the HTML document on a web server somewhere so that others can try to reproduce it? Which of the two results is the one you expected? Is this taken from a single Python session, without reparsing in between?
So how do I go through the tree and get each id or name attribute value?
I'd use this code: for elem in tree.iter('div'): if elem.get('class') == 'section': print elem.attrib Likely faster than any of the above. Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Tim Arnold, 01.08.2012 00:48:
Have a look at the link iterator in lxml.html. It also handles references and links in CSS, for example.
I don't see the difference, either. Could you put the HTML document on a web server somewhere so that others can try to reproduce it? Which of the two results is the one you expected? Is this taken from a single Python session, without reparsing in between?
So how do I go through the tree and get each id or name attribute value?
I'd use this code: for elem in tree.iter('div'): if elem.get('class') == 'section': print elem.attrib Likely faster than any of the above. Stefan
participants (2)
-
Stefan Behnel
-
Tim Arnold