Hi Steve, Steve Howe wrote:
Thursday, May 11, 2006, 5:05:28 AM, you wrote:
I hope you meant (c)ElementTree, right? I posted some pretty interesting benchmark results on that lately. You can really look how memory usage increases MB by MB... If you meant lxml, you should redo the test and make sure there was no swapping involved. These kind of benchmarks should always read from RAM.
No, I meant lxml, and yes, I could have made it read from ram, but I think it did swap. It was not a very controlled test, I admit, just something quick I made on my python prompt. I just ran the test again and the results were similar. There are is plenty of ram available, however.
Hmm, interesting. Could you run the I/O tests from the benchmark suite (trunk version) and post the results? My results here are that lxml is about 20-50 times faster on serialization than cET or ET. I would be surprised if that was so much different on your machine. Try: cd lxml python bench.py -i -a tostring_utf8 tostring_utf16 tostring_utf8_unicode_XML write_utf8_parse_stringIO (the latter all in one line, '-i' adds 'src' to the PYTHONPATH, '-a' runs with lxml, cET and ET if installed) It's gonna take a while and the output is rather lengthy. The benchmarks run this, which is more or less what we talk about here: ---------------------------------- @with_text(text=True, utext=True) def bench_tostring_utf8(self, root): self.etree.tostring(root, 'UTF-8') @with_text(text=True, utext=True) def bench_tostring_utf16(self, root): self.etree.tostring(root, 'UTF-16') @with_text(text=True, utext=True) def bench_tostring_utf8_unicode_XML(self, root): xml = unicode(self.etree.tostring(root, 'UTF-8'), 'UTF-8') self.etree.XML(xml) @with_text(text=True, utext=True) def bench_write_utf8_parse_stringIO(self, root): f = StringIO() self.etree.ElementTree(root).write(f, 'UTF-8') f.seek(0) self.etree.parse(f) ---------------------------------- Thanks, Stefan