Remove whitespaces and line breaks in a XML file
Stefan Behnel
stefan_ml at behnel.de
Mon Feb 7 15:54:30 EST 2011
David Vicente, 07.02.2011 18:45:
> I´m parsing an xml file with xml.etree. It works correctly, but I have a
> problem with the text attribute of the elements which should be empty. For
> example, in this case:
>
> <book>
>
> <author>Ken<author>
>
> </book>
>
>
>
> The text element of “book” should be empty, but it returns me some
> whitespaces and break lines. I can´t get remove these whitespaces without
> remove information.
Only a DTD (or schema) can provide the information which whitespace in an
XML document is meaningful and which isn't, so there is no generic way to
"do it right", especially not for something as generic as an XML parser.
What may work for you is to check if an Element has children and only
whitespace as text ("not el.text or not el.text.strip()"), and only then
replace it by None.
Stefan
More information about the Python-list
mailing list