Hi. On my blog I mentioned that I was building up a tree from the
portal catalog and that ElementTree's SimpleXMLWriter was 2.5 times
faster than doing it via lxml. Martijn asked me to post here the code I
used, as it shouldn't have been such a large difference.
He was right -- sorry!!!. In the original comparison, for the lxml part
I turned the pathindex string into subnodes for paths but didn't do this
for the XMLWriter part.
In a 1:1 comparison, lxml is actually a bit faster. For those
interested, my test is below, with the results in the docstring.
I'll post a correction in my blog post. (Note: I'm using the scoder2
branch.)
--Paul
"""
A simple speed comparison of SimpleXMLWriter versus lxml for DOM
node creation.
Results for 50 entries:
0.0206758022308 ...XMLWriter average time
0.0190546989441 ...Etree average time
Results for 5000 entries:
1.80105571747 ...XMLWriter average time
0.999091792107 ...Etree average time
"""
from time import time
from lxml.etree import Element
import cStringIO
from elementtree.SimpleXMLWriter import XMLWriter
entries = 5000
entry = {
"id": "1092309103910930",
"creator": "automaticforthepeople",
"title": "An entry in the portal_catalog",
"description": "A longer textual description would go here",
"created": "12/25/2005 00:00:00 GMT+1",
"is_folderish": "1",
"portal_type": "ATFolder",
"review_state": "published",
"path": "/Members/automaticforthepeople/junk",
}
def makeXMLWriter():
f = cStringIO.StringIO()
tree = XMLWriter(f)
root = tree.start("catalog")
for i in range(entries):
tree.start("entry",
id=entry['id'],
creator=entry['creator'],
title=entry['title'],
description=entry['description'],
created=entry['created'],
is_folderish=entry['is_folderish'],
portal_type=entry['portal_type'],
review_state=entry['review_state'],
)
tree.end()
tree.close(root)
def makeEtree():
root = Element("catalog")
for i in range(entries):
item = Element("entry")
item.set("id", entry["id"])
item.set("creator", entry["creator"])
item.set("title", entry["title"])
item.set("description", entry["description"])
item.set("created", entry["created"])
item.set("is_folderish", entry["is_folderish"])
item.set("portal_type", entry["portal_type"])
item.set("review_state", entry["review_state"])
root.append(item)
def main():
repeat = 10
# Time first
start1 = time()
for i in range(repeat):
makeXMLWriter()
print (time() - start1)/repeat, "...XMLWriter average time"
# Time second
start2 = time()
for i in range(repeat):
makeEtree()
print (time() - start2)/repeat, "...Etree average time"
if __name__ == "__main__": main()