
Hi,
then I could trigger an event with elem.tag but I started the script like this:
file = "file.html" parser = etree.HTMLParser() tree = etree.parser(file, parser)
And I started to navigate inside the file with xpath: for document in tree.xpath('/html/body/table'): [...] for data in document.xpath("./tr/td/") ... here are my data after <br><br>
Then I am looking for something like data.iter() from xpath result, but it is not possible.
How to get a tree (not fromstring) but from where I am during the parsing ?
Not sure I understand. After parsing you have a tree. If you want to control events during parsing, while building the tree, you should probably have a look at http://lxml.de/parsing.html#iterparse-and-iterwalk If you want to iterate through XPath results you'll need to take care of what your XPath results actually are (please look at the lxml xpath docs): etree.Element results have an iter() method themselves:
tree = etree.fromstring('<root><sub><subsub>bla</subsub></sub></root>') tree.xpath('//subsub') [<Element subsub at 0x7fb7a43c1190>] tree.xpath('//subsub')[0] <Element subsub at 0x7fb7a43c1190> tree.xpath('//subsub')[0].iter() <lxml.etree.ElementDepthFirstIterator object at 0x7fb7a43c11e0>
While string results do not:
tree.xpath('//subsub/text()') ['bla'] tree.xpath('//subsub/text()')[0].iter() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: '_ElementStringResult' object has no attribute 'iter'
Caution: Strings are iterable themselves though:
iter(tree.xpath('//subsub/text()')[0]) <iterator object at 0x7fb7a43bb390> list(iter(tree.xpath('//subsub/text()')[0])) ['b', 'l', 'a']
Holger