How to count the number of times some tag occures in xlm file?
data:image/s3,"s3://crabby-images/175c3/175c3a322cb1ac9cb31e9951254c7c695c996391" alt=""
The simplest way is: counter=0 for i in tree.iter(tag='sometag'): counter += 1 But please note, I don't use 'i' here. Any cleaner way? P.
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Simon Sapin, 27.04.2012 12:07:
As a general remark, it's a common convention in Python to mark the loop variable as unused by naming it "_", i.e. "for _ in ...". You could also use this: sum(1 for _ in tree.iter(tag='sometag')) But I don't find it clearer.
How about this?
tree.xpath('count(//sometag)')
And if you want to wrap it in a callable, this should work: count_tags = etree.XPath('count(//$tag)') count_tags(tree, tag='sometag')
(You would have to check if it actually is correct or faster.)
It likely is because it doesn't need to instantiate the elements in order to count them, but it would still depend on the number of elements being found. Tree iteration is blazingly fast in lxml, often substantially faster than the more generic XPath evaluation. Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Simon Sapin, 27.04.2012 12:07:
As a general remark, it's a common convention in Python to mark the loop variable as unused by naming it "_", i.e. "for _ in ...". You could also use this: sum(1 for _ in tree.iter(tag='sometag')) But I don't find it clearer.
How about this?
tree.xpath('count(//sometag)')
And if you want to wrap it in a callable, this should work: count_tags = etree.XPath('count(//$tag)') count_tags(tree, tag='sometag')
(You would have to check if it actually is correct or faster.)
It likely is because it doesn't need to instantiate the elements in order to count them, but it would still depend on the number of elements being found. Tree iteration is blazingly fast in lxml, often substantially faster than the more generic XPath evaluation. Stefan
participants (3)
-
Piotr Oh
-
Simon Sapin
-
Stefan Behnel