On Fri, Jan 25, 2019 at 6:15 PM Stefan Behnel <stefan_ml@behnel.de> wrote:
> def iter_xml_records(file_path, record_xpath, column_xpaths):
>     doc = ET.iterparse(file_path, events=('start', 'end'))
>     try:
>         _, root = next(doc)
>     except StopIteration:
>         return
>     start_tag = None
>     for event, element in doc:
>         if event == 'start' and start_tag is None:
>             start_tag = element.tag
>         if event == 'end' and element.tag == start_tag:
>             for record_node in record_xpath(root):
>                 yield [xp(record_node) for xp in column_xpaths]
>             start_tag = None
>             root.clear()

Well, if you need all elements, then don't throw them away. You are
clearing the root node, with whatever content it has up to that point. But
you may still need that content.

Read the warning in the docs:

https://lxml.de/parsing.html#modifying-the-tree

 OK, thanks for that pointer. I'll have to refine the code.

Wietse