[lxml-dev] let lxml write the ?xml pi
Hi, I think this has been asked before, but I can't find it in my archive (or the official one).. Is it possible to tell lxml to add the <?xml version="1.0" encoding=".."?> when serializing an ElementTree? I couldn't find anything about this in the documentation. Andreas -- You have taken yourself too seriously.
On 19.06.06 12:14:02, Andreas Pakulat wrote:
I think this has been asked before, but I can't find it in my archive (or the official one)..
Is it possible to tell lxml to add the <?xml version="1.0" encoding=".."?> when serializing an ElementTree? I couldn't find anything about this in the documentation.
I found the thread again and it seems this is a regression in lxml 1.0.1 here. I can't get lxml to put a Processing Instruction on top of the document, neither with tostring() nor with write() (providing an encoding of 'utf-8'). Even not when the original file had a PI.
t=etree.XML('<a><b /></a>') etree.tostring(t, 'utf-8') '<a><b/></a>' t=etree.XML('<?xml version="1.0" encoding="us-ascii"?><a><b /></a>') etree.tostring(t, 'utf-8') '<a><b/></a>'
Andreas -- Your business will assume vast proportions.
Andreas Pakulat wrote:
On 19.06.06 12:14:02, Andreas Pakulat wrote:
I think this has been asked before, but I can't find it in my archive (or the official one)..
Is it possible to tell lxml to add the <?xml version="1.0" encoding=".."?> when serializing an ElementTree? I couldn't find anything about this in the documentation.
I found the thread again and it seems this is a regression in lxml 1.0.1 here. I can't get lxml to put a Processing Instruction on top of the document, neither with tostring() nor with write() (providing an encoding of 'utf-8'). Even not when the original file had a PI.
Note that these are strictly speaking *not* real processing instructions but the XML prologue. Note also that if your XML document contains only ASCII or UTF-8 content, it's not required to put a prolog up at all. So, lxml is not broken in any way there. However, Stefan did add the ability to control the writing of the declaration using the xml_declaration keyword arguments: I.e., try the following: etree.tostring(t, 'utf-8', xml_declaration=True) I'm not sure whether we document this very well, though.. Regards, Martijn
On 19.06.06 12:48:37, Martijn Faassen wrote:
Andreas Pakulat wrote:
On 19.06.06 12:14:02, Andreas Pakulat wrote:
I think this has been asked before, but I can't find it in my archive (or the official one)..
Is it possible to tell lxml to add the <?xml version="1.0" encoding=".."?> when serializing an ElementTree? I couldn't find anything about this in the documentation.
I found the thread again and it seems this is a regression in lxml 1.0.1 here. I can't get lxml to put a Processing Instruction on top of the document, neither with tostring() nor with write() (providing an encoding of 'utf-8'). Even not when the original file had a PI.
Note that these are strictly speaking *not* real processing instructions but the XML prologue. Note also that if your XML document contains only ASCII or UTF-8 content, it's not required to put a prolog up at all. So, lxml is not broken in any way there.
Hmm, right. However if I have a document with an existing xml declaration and read that into lxml I'd expect to get it out again. IMHO lxml shouldn't change the document implicitly.
However, Stefan did add the ability to control the writing of the declaration using the xml_declaration keyword arguments:
I.e., try the following:
etree.tostring(t, 'utf-8', xml_declaration=True)
That works, thanks.
I'm not sure whether we document this very well, though..
This is not documented at all, there's 1 example usage but in a more or less unrelated context (explaining the encoding parameter). Andreas -- Stay the curse.
Andreas Pakulat wrote:
On 19.06.06 12:48:37, Martijn Faassen wrote: [snip]
I'm not sure whether we document this very well, though..
This is not documented at all, there's 1 example usage but in a more or less unrelated context (explaining the encoding parameter).
Bad of us, we should document new features as we create them... Feel free to contribute some text (with doctests :) for our documentation! Regards, Martijn
Hi Andreas, Andreas Pakulat wrote:
etree.tostring(t, 'utf-8', xml_declaration=True) I'm not sure whether we document this very well, though..
This is not documented at all, there's 1 example usage but in a more or less unrelated context (explaining the encoding parameter).
It's supposed to be in ET 1.3, we're just a little ahead with our releases, so the documentation is not yet in the official places. We should just add it to the docstrings. Stefan
Hi! I started using lxml some weeks ago, and have been lurking on the mailing list for some time now. Recently I had the problem that the xml prologue is not included by default, and stumbled over the following mail: On Mon, Jun 19, 2006 at 12:48:37PM +0200, Martijn Faassen wrote:
I.e., try the following:
etree.tostring(t, 'utf-8', xml_declaration=True)
Is there any reason that the method write_c14n() does not support this flag? The canonical form is a bit more readable, therefore I'd prefer to use this method. Best regards, Albert
Hi Albert, Albert Brandl wrote:
I started using lxml some weeks ago, and have been lurking on the mailing list for some time now. Recently I had the problem that the xml prologue is not included by default, and stumbled over the following mail:
On Mon, Jun 19, 2006 at 12:48:37PM +0200, Martijn Faassen wrote:
I.e., try the following:
etree.tostring(t, 'utf-8', xml_declaration=True)
Is there any reason that the method write_c14n() does not support this flag? The canonical form is a bit more readable, therefore I'd prefer to use this method.
As the documentation of the write_c14n() method states, it always writes UTF-8 encoded byte streams, so there is no real need for the prologue. I wouldn't mind adding this, though. Things like 'standalone' and the XML version would otherwise not be available in the output. BTW, if it's about the readability, pretty printing might be closer to what you want anyway. Stefan
Hi Stefan, On Sat, Jul 01, 2006 at 02:53:03PM +0200, Stefan Behnel wrote:
As the documentation of the write_c14n() method states, it always writes UTF-8 encoded byte streams, so there is no real need for the prologue. I wouldn't mind adding this, though. Things like 'standalone' and the XML version would otherwise not be available in the output.
I recently learned about section 4.1 of the C14N recommendation, http://www.w3.org/TR/xml-c14n#NoXMLDecl, which states that the canonical form does not contain a prologue. Therefore, write_c14n() is ok - sorry for the request.
BTW, if it's about the readability, pretty printing might be closer to what you want anyway.
Thanks for the hint. In lxml 1.0.1, the pretty printed version adds information about the namespace to every tag. Unfortunately, this decreases the readibility, since in my case, almost all tags have a namespace. A "pretty_print" flag for write_c14n() would be a perfect workaround, though :-) Best regards, Albert
Hi Albert, Albert Brandl wrote:
On Sat, Jul 01, 2006 at 02:53:03PM +0200, Stefan Behnel wrote:
As the documentation of the write_c14n() method states, it always writes UTF-8 encoded byte streams, so there is no real need for the prologue. I wouldn't mind adding this, though. Things like 'standalone' and the XML version would otherwise not be available in the output.
I recently learned about section 4.1 of the C14N recommendation, http://www.w3.org/TR/xml-c14n#NoXMLDecl, which states that the canonical form does not contain a prologue. Therefore, write_c14n() is ok - sorry for the request.
Thought so. Thanks for checking.
BTW, if it's about the readability, pretty printing might be closer to what you want anyway.
Thanks for the hint. In lxml 1.0.1, the pretty printed version adds information about the namespace to every tag.
Not on my side. How do you build the tree?
Unfortunately, this decreases the readibility, since in my case, almost all tags have a namespace. A "pretty_print" flag for write_c14n() would be a perfect workaround, though :-)
I don't think that's gonna happen. C14N is meant to be a well-defined XML formatting style, and pretty printing is not part of the standard. Stefan
participants (4)
-
Albert Brandl
-
Andreas Pakulat
-
Martijn Faassen
-
Stefan Behnel