[XML-SIG] Parsing XML file with Minidom has problem with cr/lf

Dieter Maurer dieter at handshake.de
Mon May 10 09:07:55 CEST 2010


Stefan Behnel wrote at 2010-5-10 08:57 +0200:
>Dieter Maurer, 10.05.2010 07:50:
>> Peterson, Wayne wrote at 2010-5-8 23:43 -0700:
>>> I am parsing an XML file with Python 2.6.5 minidom in Windows and it is
>>> mostly working but minidom seems to have problems dealing with Windows
>>> cr/lf characters. It creates an extra textnode that needs to be ignored
>>> instead of just returning the xml elements. I have tried different
>>> methods of opening the file but it doesn't seem to make a difference. It
>>> is happiest when reading a file in Unix format.
>>
>> The parser should not see these "cr/lf" characters at all.
>>
>> Python strings itself use only "\n" (aka "lf") to delimite lines.
>> The "\r" (aka "cr") should only be introduced when those lines
>> are written to text files. And they should be removed when
>> those line are read in again.
>>
>> Are you sure that you access your files as "text" files?
>
>The correct way to parse XML files is as binary data.

Why do you think so?

The default "minidom" parser seems not to expect "\r\n" line endings....



--
Dieter


More information about the XML-SIG mailing list