convert xhtml back to html
stefan_ml at behnel.de
Fri Apr 25 08:16:57 CEST 2008
bryan rasmussen top-posted:
> On Thu, Apr 24, 2008 at 9:55 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> from lxml import etree
>> tree = etree.parse("thefile.xhtml")
>> tree.write("thefile.html", method="html")
> wow, that's pretty nice there.
> Just to know: what's the performance like on XML instances of 1 GB?
That's a pretty big file, although you didn't mention what kind of XML
language you want to handle and what you want to do with it.
lxml is pretty conservative in terms of memory:
But the exact numbers depend on your data. lxml holds the XML tree in memory,
which is a lot bigger than the serialised data. So, for example, if you have
2GB of RAM and want to parse a serialised 1GB XML file full of little
one-element integers into an in-memory tree, get prepared for lunch. With a
lot of long text string content instead, it might still fit.
However, lxml also has a couple of step-by-step and stream parsing APIs:
They might do what you want.
More information about the Python-list