[I originally wrote this for StackOverflow, hence all the backticks; I've left them in to deliminate strings.] Not sure if this is a bug or me failing to understand `CDATA`. I have some Python 2 `unicode` objects I want to include in the text of my XML document; I've been asked to put them into `CDATA`, so I used `element.text = lxml.etree.CDATA(mytext)`. However, `þ` (repr: `u'\xfe'`) and `þ` (repr: `u'þ'`) seem to produce the same XML output, despite being very different strings: `<root><![CDATA[þ]]></root>` which lxml parses as `þ`. Can someone confirm it's a bug, or correct my wrongheadedness about using CDATA? BTW: `element.text = mytext` (i.e. no CDATA involved) works as I'd expect. >>> lxml.etree.__version__ u'3.2.4' _test code:_ import lxml.etree import sys literal=sys.argv[1].decode('utf-8') print repr(literal) literalroot=lxml.etree.fromstring("<root></root>") literalroot.text = lxml.etree.CDATA(literal) xml = lxml.etree.tostring(literalroot) print xml print literal, lxml.etree.fromstring(xml).text $ python test.py "þ" u'\xfe' <root><![CDATA[þ]]></root> þ þ $ python test.py "þ" u'þ' <root><![CDATA[þ]]></root> þ þ