Re: [lxml] Change element text and keep CDATA if present
Hi,
when changing the text of an element, the CDATA section gets stripped away.
I'm aware of the CDATA class, but I only want to apply it if the element had a CDATA section before the text change. I'm also already parsing the XML data with 'strip_cdata=False'.
Something like: newText = someTransformation(element.text) element.text = CDATA(newText) if hasCDATA(element) else newText
But I can't figure out how to do the detection of 'hasCDATA'. Thanks for any hints!
As per https://mailman-mail5.webfaction.com/pipermail/lxml/2020-April/024092.html looks like you currently can't check if text originated from CDATA (and probably shouldn't need to, anyway ;-) - but I couldn't find the OPs motivation in the thread) Best regards, Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz.
Hi,
As per https://mailman-mail5.webfaction.com/pipermail/lxml/2020-April/024092.html looks like you currently can't check if text originated from CDATA (and probably shouldn't need to, anyway ;-) - but I couldn't find the OPs motivation in the thread)
our use case is a bit special, that we‘ve a xml based gui definition format, which contains python code for the handling of gui events. CDATA is used to not have to escape certain characters, because the escaping reduces the readability of the python code quite a bit. Because the indentation of the python code is a bit messed up, I wanted to use reindent.py to indent the whole python code in all of our gui files, which are quite a few. So here I‘m with my CDATA problem ;). Greetings, Daniel
Daniel Trstenjak schrieb am 03.11.20 um 19:32:
As per https://mailman-mail5.webfaction.com/pipermail/lxml/2020-April/024092.html looks like you currently can't check if text originated from CDATA (and probably shouldn't need to, anyway ;-) - but I couldn't find the OPs motivation in the thread)
our use case is a bit special, that we‘ve a xml based gui definition format, which contains python code for the handling of gui events. CDATA is used to not have to escape certain characters, because the escaping reduces the readability of the python code quite a bit.
Because the indentation of the python code is a bit messed up, I wanted to use reindent.py to indent the whole python code in all of our gui files, which are quite a few.
So here I‘m with my CDATA problem ;).
What I would do is to read the XML, then walk through the tags that contain Python code and indent only explicitly their text content, and then write the whole XML file back out. That seems simple enough and much safer than trying to handle the Python code at the byte stream level. Stefan
Stefan Behnel schrieb am 07.11.20 um 10:44:
Daniel Trstenjak schrieb am 03.11.20 um 19:32:
As per https://mailman-mail5.webfaction.com/pipermail/lxml/2020-April/024092.html looks like you currently can't check if text originated from CDATA (and probably shouldn't need to, anyway ;-) - but I couldn't find the OPs motivation in the thread)
our use case is a bit special, that we‘ve a xml based gui definition format, which contains python code for the handling of gui events. CDATA is used to not have to escape certain characters, because the escaping reduces the readability of the python code quite a bit.
Because the indentation of the python code is a bit messed up, I wanted to use reindent.py to indent the whole python code in all of our gui files, which are quite a few.
So here I‘m with my CDATA problem ;).
What I would do is to read the XML, then walk through the tags that contain Python code and indent only explicitly their text content, and then write the whole XML file back out. That seems simple enough and much safer than trying to handle the Python code at the byte stream level.
Just in case the intention wasn't obvious, by doing this, you could just *always* wrap the code in CDATA(), regardless of how it came in. Then, afterwards, you can be sure that *everything* is nicely formatted and looks proper, regardless of how the original authors got the Python code into the XML. Stefan
participants (3)
-
Daniel Trstenjak
-
Holger.Joukl@LBBW.de
-
Stefan Behnel