Op 25-04-2024 om 16:58 schreef Stefan Behnel:
If it's a somewhat straightforward change that doesn't need tons of 
back-and-forth testing and debugging, and you have a github account, you 
could also use their CI service (Github Actions), either on your own 
account or in lxml's account via a pull request.

I now created a fork of lxml on GitHub[1]. I made my change to the CDATA class, and the CI action started running. 🙂 It had build failures that suggest the build matrix may not be entirely correct anymore. However, the Windows builds had no issues. I was happy to find that wheels had been created as build artifacts. I could test my change.

As to the actual change: I'm trying to write a conversion program that outputs XML[2]. It must match the output of an existing program. Semantically it already does, but I'd like it to match the way CDATA is handled. To this end, I'd like to allow "wrapped" CDATA. The CDATA class currently disallows this: it checks for the presence of ']]>', and raises if found.

I added a parameter to turn off this check. I expected to need to do the escaping myself, but it seems lxml handles this just fine out of the box. For example, this tester code:

from lxml import etree
from lxml.etree import CDATA
def main():
    root = etree.Element("dummy")
    txt = '<root><![CDATA[Something]]></root>'
    root.text = CDATA(txt, False)
    out = etree.tostring(root).decode()
    print(out)
if __name__ == '__main__':
    main()

...prints this:

<dummy><![CDATA[<root><![CDATA[Something]]]]><![CDATA[></root>]]></dummy>

This looks good to me (given my constraints, that is 🙂). So I wonder how to proceed. Would you be willing to change anything here? If so, would you prefer a flag to turn off the check, or just to remove the check (or something else)? If so, I'll try my hands at a pull request.

(I may later try and study how the CI does the build, and why it succeeds where my manual attempts failed, out of curiousity.)

Kind regards,
Gertjan.



[1] https://github.com/gertjanklein/lxml
[2] https://github.com/gertjanklein/iris-udl-to-xml