[New-bugs-announce] [issue7139] Incorrect serialization of end-of-line characters in attribute values

Moriyoshi Koizumi report at bugs.python.org
Thu Oct 15 08:21:30 CEST 2009


New submission from Moriyoshi Koizumi <mozo+python at mozo.jp>:

ElementTree doesn't correctly serialize end-of-line characters (#xa, 
#xd) in attribute values.  Since bare end-of-line characters are 
converted to #x20 by the parser according to the specification [1], such 
characters that are represented as character references in the original 
document must be serialized in the same form.

[1] http://www.w3.org/TR/xml11/#AVNormalize   

### sample code

from xml.etree.ElementTree import ElementTree
from cStringIO import StringIO

# builder = ElementTree(file=StringIO("<foo>\x0d</foo>"))
# out = StringIO()
# builder.write(out)
# print out.getvalue()

out = StringIO()
ElementTree(file=StringIO(
'''<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ATTLIST foo attr CDATA "">
]>
<foo attr="   test
&#13;test&#32; test&#10;a  ">&#10;</foo>
''')).write(out)
# should be "<foo attr="   test &#13;test  test&#10;a  ">\x0a</foo>
print out.getvalue()

out = StringIO()
ElementTree(file=StringIO(
'''<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ATTLIST foo attr NMTOKENS "">
]>
<foo attr="   test
&#13;test&#32; test&#10;a  ">&#10;</foo>
''')).write(out)
# should be "<foo attr="test &#13;test test&#10;a">\x0a</foo>
print out.getvalue()

----------
components: XML
messages: 94074
nosy: moriyoshi
severity: normal
status: open
title: Incorrect serialization of end-of-line characters in attribute values
type: behavior
versions: Python 2.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7139>
_______________________________________


More information about the New-bugs-announce mailing list