[XML-SIG] Parsing XML file with Minidom has problem with cr/lf

Stefan Behnel stefan_ml at behnel.de
Sun May 9 19:27:05 CEST 2010


Peterson, Wayne, 09.05.2010 08:43:
> I am parsing an XML file with Python 2.6.5 minidom in Windows and it is
> mostly working but minidom seems to have problems dealing with Windows
> cr/lf characters. It creates an extra textnode that needs to be ignored
> instead of just returning the xml elements. I have tried different
> methods of opening the file but it doesn't seem to make a difference. It
> is happiest when reading a file in Unix format.

Whitespace is significant in the W3C DOM, so minidom must provide it in the 
DOM tree. It doesn't "have problems" because it creates text nodes for 
them, that's just the way things work.

Note that the xml.etree.ElementTree package tends to be a lot more user 
friendly for XML handling than the minidom package, simply because if 
focuses on the XML Infoset and moves text out of the way when dealing with 
elements.

Stefan


More information about the XML-SIG mailing list