puzzled by xml declaration
data:image/s3,"s3://crabby-images/d5859/d5859e89788ed2836a0a4ecbda4a1f9d4a69b9e7" alt=""
I usually serialize lxml trees (lxml 3.7, python 3.6) with the command print(etree.tostring(tree, encoding=”unicode”, pretty_print=True) That command strips the xml declaration from the first line. This happens to matter at some level to the eXist, the database I use. I posed a question on this list a couple of weeks ago about how to keep the declaration. Holger Jouki helpfully suggested that I need to add an explicit xml_declaration. There are of postings on Stack Overflow, mainly from a few years ago, that say the same thing. However, if I add the “xml_declaration=True” to the above command, I get the error message File "/Users/martin/Dropbox/PycharmProjects/earlyprint/transform/try12.py", line 62, in <module> print(etree.tostring(tree, xml_declaration =True, encoding="unicode" , pretty_print=True),file=fileout) File "src/lxml/lxml.etree.pyx", line 3320, in lxml.etree.tostring (src/lxml/lxml.etree.c:80187) ValueError: Serialisation to unicode must not request an XML declaration If I formulate the command (as recommended by Stack Overflow) as Print(etree.tostring(tree, encoding=”utf-8”, xml_declaration=True, pretty_print=True) I do indeed get the xml declaration, but the file is processed in “b’” format, with line breaks given as ‘\n;’, which is not what I want. I assume that “encoding=”unicode” is or should be the common garden variety form of serializing in the world of Python 3.x. But how can I assure that serialization will keep rather than drop the xml declaration? There are ways of adding afterwards but that’s rather kludgy, and it’s also easy to forget it. Grateful for any help MM
participants (1)
-
Martin Mueller