[XML-SIG] Replicating DTD information using XMLFilterBase and XMLGenerator

James Sulak jsulak at gmail.com
Tue Jul 29 02:56:32 CEST 2008


Thanks, Stefan, for pointing me to lxml.  It looks like a good
alternative to SAX in this situation.  However, I'm a little confused
as to the best way to remove elements from the tree while keeping
their tail text.  This is what I have so far:

context = etree.iterparse("test.xml")

for event, element in context:
    for title in element.xpath("child::title"):
        element.remove(title)

Do I need to explicitly assign the tail text to either the parent or
the preceding sibling?  If so, what's the best way to accomplish that?

Thanks,

-James


On Sun, Jul 27, 2008 at 3:38 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> James Sulak wrote:
>> I'm attempting to use xml.sax.utils.XMLFilterBase and XMLGenerator to
>> take an input XML document, filter out certain elements, and output
>> the result to a second XML file.  I have it mostly working, except
>> that I lose the DTD declaration and anything (processing instructions
>> or comments) before the root element.  I believe I'm supposed to be
>> using a LexicalHandler to get the information from the DTD, but I have
>> not been able to figure out how to do this, or how to integrate it
>> with the rest of the code.
>>
>> I'm pretty new at using Python (and SAX, for that matter) to work with
>> XML
>
> Try lxml's iterparse() instead of SAX. It will build an in-memory tree
> (including the DTD or its reference if you want, see the parser docs), but you
> can remove the unwanted elements from the tree while it parses. It's still
> pretty memory friendly and definitely a lot easier to work with than SAX.
>
> http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk
> http://codespeak.net/lxml/tutorial.html#parsing-from-strings-and-files
>
> Stefan
>


More information about the XML-SIG mailing list