[XML-SIG] Replicating DTD information using XMLFilterBase and XMLGenerator
James Sulak
jsulak at gmail.com
Tue Jul 29 02:56:32 CEST 2008
Thanks, Stefan, for pointing me to lxml. It looks like a good
alternative to SAX in this situation. However, I'm a little confused
as to the best way to remove elements from the tree while keeping
their tail text. This is what I have so far:
context = etree.iterparse("test.xml")
for event, element in context:
for title in element.xpath("child::title"):
element.remove(title)
Do I need to explicitly assign the tail text to either the parent or
the preceding sibling? If so, what's the best way to accomplish that?
Thanks,
-James
On Sun, Jul 27, 2008 at 3:38 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> James Sulak wrote:
>> I'm attempting to use xml.sax.utils.XMLFilterBase and XMLGenerator to
>> take an input XML document, filter out certain elements, and output
>> the result to a second XML file. I have it mostly working, except
>> that I lose the DTD declaration and anything (processing instructions
>> or comments) before the root element. I believe I'm supposed to be
>> using a LexicalHandler to get the information from the DTD, but I have
>> not been able to figure out how to do this, or how to integrate it
>> with the rest of the code.
>>
>> I'm pretty new at using Python (and SAX, for that matter) to work with
>> XML
>
> Try lxml's iterparse() instead of SAX. It will build an in-memory tree
> (including the DTD or its reference if you want, see the parser docs), but you
> can remove the unwanted elements from the tree while it parses. It's still
> pretty memory friendly and definitely a lot easier to work with than SAX.
>
> http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk
> http://codespeak.net/lxml/tutorial.html#parsing-from-strings-and-files
>
> Stefan
>
More information about the XML-SIG
mailing list