Python script to optimize XML text

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Mon Sep 24 22:19:13 EDT 2007


En Mon, 24 Sep 2007 17:36:05 -0300, Robert Dailey <rcdailey at gmail.com>  
escribi�:

> I'm currently seeking a python script that provides a way of optimizing  
> out
> useless characters in an XML document to provide the optimal size for the
> file. For example, assume the following XML script:
>
> <root>
>     <Test></Test>
>     <!-- <CommentedOutElement/> -->
>
>     <!-- Do Something Else -->
> </root>
>
> By running this through an XML optimizer, the file would appear as:
>
> <root><Test/></root>

ElementTree does that almost for free. I've just posted an example.

source = """<root>
      <Test></Test>
      <!-- <CommentedOutElement/> -->

      <!-- Do Something Else -->
</root>"""

import xml.etree.ElementTree as ET
tree = ET.XML(source)
print ET.tostring(tree)

Output:
<root>
      <Test />



</root>

If you still want to remove all whitespace:

def stripws(node):
     if node.text:
         node.text = node.text.strip()
     if node.tail:
         node.tail = node.tail.strip()
     for child in node.getchildren():
         stripws(child)
stripws(tree)
print ET.tostring(tree)

Output:
<root><Test /></root>

-- 
Gabriel Genellina




More information about the Python-list mailing list