Encoding newlines in XML?

Fredrik Lundh fredrik at pythonware.com
Tue Mar 21 18:22:11 CET 2006


Robert Kern wrote:

> Other libraries seem to get this right.
>
> In [89]: from lxml import etree
>
> In [90]: e = etree.Element('SomeTag', text="def _f():\n  return 3\n")
>
> In [93]: e.attrib
> Out[93]: {'text': 'def _f():\n  return 3\n'}
>
> In [94]: etree.dump(e)
> <SomeTag text="def _f():&#10;  return 3&#10;"/>
>
> In [96]: etree.dump(etree.XML('<SomeTag text="def _f():&#10;  return 3&#10;"/>'))
> <SomeTag text="def _f():&#10;  return 3&#10;"/>
>
> I'll bet good money that ElementTree also gets this right.

well, not quite:

>>> import cElementTree as ET
>>> e = ET.Element('SomeTag', text="def _f():\n  return 3\n")
>>> e
<Element 'SomeTag' at 0091A320>
>>> e.attrib
{'text': 'def _f():\n  return 3\n'}
>>> ET.tostring(e)
'<SomeTag text="def _f():\n  return 3\n" />'
>>> ET.tostring(ET.XML(ET.tostring(e)))
'<SomeTag text="def _f():   return 3 " />'
>>> ET.tostring(ET.XML(ET.tostring(e).replace("\n", "&#10;")))
'<SomeTag text="def _f():\n  return 3\n" />'

I don't recommend putting non-trivial formatted stuff in attributes
(the resulting files look really messy, and not all parsers support it),
but the serializer should be fixed.

</F>






More information about the Python-list mailing list