[issue5752] xml.dom.minidom does not escape newline characters within attribute values

Tomalak report at bugs.python.org
Wed May 13 13:44:37 CEST 2009


Tomalak <M8R-t1tut01 at mailinator.com> added the comment:

Francesco,

> if you want to encode the newline character, 
> this should be done by both parseString and 
> setAttribute methods. Otherwise, the 
> behaviour is not symmetric.

I believe you still don't see the issue. The behaviour is not symmetric
*now*. You store a '\n' in an attribute value with setAttribute(), save
the document to XML, load it again and out comes a space where the '\n'
should have been.

The point is that parseString() behaves correctly, but serializing does
not. There is only one side to fix, because only one side is broken.

> If you want to encode the newline in different 
> manner, you should develop a patch that
> introduces this kind of encoding in both 
> parseString and setAttribute methods.

It would be pointless to do the encoding in setAttribute(). The valid
ways to XML-encode a '\n' character are '&#xA', '&#x0A' or '&#10'. Doing
so in setAttribute() would produce doubly encoded output, like this:
'&amp;#10'. This is even more wrong.

However, if parseString() encounters a '&#10' in the input, it correctly
translates this to '\n' in the DOM. As I said, there is nothing to fix
in parsing, this exercise is about getting minidom to actually *output*
a '&#10;' where appropriate. :-)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5752>
_______________________________________


More information about the Python-bugs-list mailing list