Iterparse raises TypeError on attempt to clean up preceding siblings
data:image/s3,"s3://crabby-images/07da3/07da38c7633272e207f3a0abc7732b110f03b80d" alt=""
Hello, While following the iterparse / modifying the tree docs (https://lxml.de/4.5/parsing.html#modifying-the-tree) in attempt to clean up preceding siblings, code raises a TypeError: 'NoneType' object does not support item deletion when the XML contains a comment preceding the root element. Please advise on a workaround for this use case if this is not a potential bug. See below reproducible example. Thank you for your time and review. ```python from io import BytesIO import lxml.etree as ET xml_txt = '''\ <!-- it will choke on this comment --> <issue> <type>BUG</type> </issue> '''.encode('utf-8') iterparse_columns = { "issue": ["type"] } data = [] for event, elem in ET.iterparse(BytesIO(xml_txt), events=('start', 'end')): node = next(iter(iterparse_columns)) curr_elem = ( elem.tag.split('}')[1] if '}' in elem.tag else elem.tag ) if event == 'start': if curr_elem == node: row = {} for col in iterparse_columns[node]: if curr_elem == col: row[col] = ( elem.text.strip() if elem.text is not None else elem.text ) if col in elem.attrib: row[col] = elem.attrib[col].strip() if event == 'end': if curr_elem == node: data.append(row) elem.clear() while elem.getprevious() is not None: del elem.getparent()[0] print(data) ``` Full traceback: ``` Traceback (most recent call last): File "iterparse_parent_comment_reprex.py", line 43, in <module> del elem.getparent()[0] TypeError: 'NoneType' object does not support item deletion ```
data:image/s3,"s3://crabby-images/07da3/07da38c7633272e207f3a0abc7732b110f03b80d" alt=""
I see one fix is to also check if `elem.getparent() is not None`. Thoughts? elem.clear() while elem.getprevious() is not None and elem.getparent() is not None: del elem.getparent()[0]
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Am June 23, 2022 11:20:59 PM UTC schrieb Parfait G <parfait.gasana@gmail.com>:
The parent won't change during the loop, so it's enough to check it once before the loop. Also, there is only one element without parent, that's the root element. Maybe you can skip that altogether in your processing? It should be the first item returned by the iterator that you got through .iter(). Just call next() on it once. Stefan
data:image/s3,"s3://crabby-images/07da3/07da38c7633272e207f3a0abc7732b110f03b80d" alt=""
I see one fix is to also check if `elem.getparent() is not None`. Thoughts? elem.clear() while elem.getprevious() is not None and elem.getparent() is not None: del elem.getparent()[0]
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Am June 23, 2022 11:20:59 PM UTC schrieb Parfait G <parfait.gasana@gmail.com>:
The parent won't change during the loop, so it's enough to check it once before the loop. Also, there is only one element without parent, that's the root element. Maybe you can skip that altogether in your processing? It should be the first item returned by the iterator that you got through .iter(). Just call next() on it once. Stefan
participants (3)
-
Parfait G
-
Parfait Gasana
-
Stefan Behnel