Python script to optimize XML text

Stefan Behnel stefan.behnel-n05pAM at web.de
Tue Sep 25 03:32:21 EDT 2007


Gabriel Genellina wrote:
> En Mon, 24 Sep 2007 17:36:05 -0300, Robert Dailey <rcdailey at gmail.com>
> escribi�:
> 
>> I'm currently seeking a python script that provides a way of
>> optimizing out
>> useless characters in an XML document to provide the optimal size for the
>> file. For example, assume the following XML script:
>>
>> <root>
>>     <Test></Test>
>>     <!-- <CommentedOutElement/> -->
>>
>>     <!-- Do Something Else -->
>> </root>
>>
>> By running this through an XML optimizer, the file would appear as:
>>
>> <root><Test/></root>
> 
> ElementTree does that almost for free.

As the OP is currently using lxml.etree (and as this was a cross-post to
c.l.py and lxml-dev), I already answered on the lxml list.

This is just to mention that the XMLParser of lxml.etree accepts keyword
options to ignore plain whitespace content, comments and processing
instructions, and that you can provide a DTD to tell it what whitespace-only
content really is "useless" in the sense of your specific application.

Stefan



More information about the Python-list mailing list