[XML-SIG] Parsing XML file with Minidom has problem with cr/lf
dieter at handshake.de
Mon May 10 09:07:55 CEST 2010
Stefan Behnel wrote at 2010-5-10 08:57 +0200:
>Dieter Maurer, 10.05.2010 07:50:
>> Peterson, Wayne wrote at 2010-5-8 23:43 -0700:
>>> I am parsing an XML file with Python 2.6.5 minidom in Windows and it is
>>> mostly working but minidom seems to have problems dealing with Windows
>>> cr/lf characters. It creates an extra textnode that needs to be ignored
>>> instead of just returning the xml elements. I have tried different
>>> methods of opening the file but it doesn't seem to make a difference. It
>>> is happiest when reading a file in Unix format.
>> The parser should not see these "cr/lf" characters at all.
>> Python strings itself use only "\n" (aka "lf") to delimite lines.
>> The "\r" (aka "cr") should only be introduced when those lines
>> are written to text files. And they should be removed when
>> those line are read in again.
>> Are you sure that you access your files as "text" files?
>The correct way to parse XML files is as binary data.
Why do you think so?
The default "minidom" parser seems not to expect "\r\n" line endings....
More information about the XML-SIG