New GitHub issue #118687 from danifus:<br>
<hr>
<pre>
# Feature or enhancement
### Proposal:
This proposal improves the performance of writing xml whose trees are made up of tag names that are predominantly strings. This comes at the cost of performance for trees with tags that are predominantly `QName`s
As far as I'm aware, using a `str` for the tag name is more common than using a `QName` and we should optimise for that scenario (for example, parsing an xml document with `ElementTree` returns `Element`s whose tags are all strings).
Reordering the following `if` block to make the `isinstance(tag, str)` check first gives a performance improvement of 1 - 1.5% on a tree parsed from a file that was about 300kb:
```
--- a/Lib/xml/etree/ElementTree.py
+++ b/Lib/xml/etree/ElementTree.py
@@ -827,12 +827,12 @@ def add_qname(qname):
# populate qname and namespaces table
for elem in elem.iter():
tag = elem.tag
- if isinstance(tag, QName):
- if tag.text not in qnames:
- add_qname(tag.text)
- elif isinstance(tag, str):
+ if isinstance(tag, str):
if tag not in qnames:
add_qname(tag)
+ elif isinstance(tag, QName):
+ if tag.text not in qnames:
+ add_qname(tag.text)
elif tag is not None and tag is not Comment and tag is not PI:
_raise_serialization_error(tag)
for key, value in elem.items():
```
As this enhancement is within a loop that traverses the entire xml document, the larger the xml tree, the greater the performance improvement as the tree traversal starts to account for more time than other setup code.
### Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
### Links to previous discussion of this feature:
_No response_
</pre>
<hr>
<a href="https://github.com/python/cpython/issues/118687">View on GitHub</a>
<p>Labels: type-feature</p>
<p>Assignee: </p>