hi there, i'm quite confused cause putting and & into the text field of an element causes the & to be converted to &? example:
from lxml import etree root = etree.Element('test') root.text = '&' print(etree.tostring(root)) b'<test>&</test>'
how would I turn that off? I suppose thats not a bug... its a feature ^^ btw. is there any way to disable lxml.html.parse(...) to convert my <number>; into the true unicode chars? i have: python3.2 from the ubuntu 11.04 repo lxml from the ubuntu 11.04 repo for python3 plan_rich
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/01/2011 04:06 PM, Richard Plangger wrote:
hi there,
i'm quite confused cause putting and & into the text field of an element causes the & to be converted to &?
example:
from lxml import etree root = etree.Element('test') root.text = '&' print(etree.tostring(root)) b'<test>&</test>'
how would I turn that off?
I suppose thats not a bug... its a feature ^^
"Naked" ampersands are not allowed in XML text. '&' is the "escaped" spelling. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk3m4QMACgkQ+gerLs4ltQ7VzQCcDgXe59/jqZsRsseIBlvoOwfd a2cAnjrQN/10u107+ljLzzWmmzrobeiH =1YDT -----END PGP SIGNATURE-----
then how would i put something like Ü into my xml file? cause it converts it to Ü all the time. (not just 220) plan_rich Am 02.06.11 03:01, schrieb Tres Seaver:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/01/2011 04:06 PM, Richard Plangger wrote:
hi there,
i'm quite confused cause putting and& into the text field of an element causes the& to be converted to&?
example:
from lxml import etree root = etree.Element('test') root.text = '&' print(etree.tostring(root)) b'<test>&</test>'
how would I turn that off?
I suppose thats not a bug... its a feature ^^
"Naked" ampersands are not allowed in XML text. '&' is the "escaped" spelling.
Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk3m4QMACgkQ+gerLs4ltQ7VzQCcDgXe59/jqZsRsseIBlvoOwfd a2cAnjrQN/10u107+ljLzzWmmzrobeiH =1YDT -----END PGP SIGNATURE-----
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@garetjax.info https://mailman-mail5.webfaction.com/listinfo/lxml
On 2 June 2011 09:07, Richard Plangger <Richard.Plangger@gmx.net> wrote:
then how would i put something like Ü into my xml file? cause it converts it to Ü all the time. (not just 220)
You need to add the equivalent unicode character to the tree (lxml will only decode the input when parsing.) For Ü that would be u''\u00dc" (as 220 in decimal is dc in hexidecimal). Character references are converted as part of parsing a document by lxml. If you want them in your output then you must serialise the document in an encoding that cannot natively represent those characters, e.g. ascii, using etree.tostring(document, encoding="ascii"). You can get the same effect by calling .encode('ascii', 'xmlcharrefreplace') on the equivalent unicode. Laurence
participants (3)
-
Laurence Rowe
-
Richard Plangger
-
Tres Seaver