Re: [lxml-dev] Re: Re: Python unicode string support in lxml

May 11, 2006

      Hi Steve,

Steve Howe wrote:
...
Thursday, May 11, 2006, 5:05:28 AM, you wrote:
...
I hope you meant (c)ElementTree, right? I posted some pretty interesting
benchmark results on that lately. You can really look how memory usage
increases MB by MB...
If you meant lxml, you should redo the test and make sure there was no
swapping involved. These kind of benchmarks should always read from RAM.
No, I meant lxml, and yes, I could have made it read from ram, but I
think it did swap. It was not a very controlled test, I admit, just
something quick I made on my python prompt. I just ran the test again
and the results were similar. There are is plenty of ram available,
however.
Hmm, interesting. Could you run the I/O tests from the benchmark suite (trunk
version) and post the results? My results here are that lxml is about 20-50
times faster on serialization than cET or ET. I would be surprised if that was
so much different on your machine.

Try:

cd lxml

python bench.py -i -a tostring_utf8 tostring_utf16 tostring_utf8_unicode_XML
write_utf8_parse_stringIO

(the latter all in one line, '-i' adds 'src' to the PYTHONPATH, '-a' runs with
lxml, cET and ET if installed)

It's gonna take a while and the output is rather lengthy.

The benchmarks run this, which is more or less what we talk about here:
----------------------------------
    @with_text(text=True, utext=True)
    def bench_tostring_utf8(self, root):
        self.etree.tostring(root, 'UTF-8')

    @with_text(text=True, utext=True)
    def bench_tostring_utf16(self, root):
        self.etree.tostring(root, 'UTF-16')

    @with_text(text=True, utext=True)
    def bench_tostring_utf8_unicode_XML(self, root):
        xml = unicode(self.etree.tostring(root, 'UTF-8'), 'UTF-8')
        self.etree.XML(xml)

    @with_text(text=True, utext=True)
    def bench_write_utf8_parse_stringIO(self, root):
        f = StringIO()
        self.etree.ElementTree(root).write(f, 'UTF-8')
        f.seek(0)
        self.etree.parse(f)
----------------------------------

Thanks,
Stefan

Re: [lxml-dev] Re: Re: Python unicode string support in lxml

Stefan Behnel