[lxml-dev] let lxml write the ?xml pi

Hi, I think this has been asked before, but I can't find it in my archive (or the official one).. Is it possible to tell lxml to add the <?xml version="1.0" encoding=".."?> when serializing an ElementTree? I couldn't find anything about this in the documentation. Andreas -- You have taken yourself too seriously.

On 19.06.06 12:14:02, Andreas Pakulat wrote:
I found the thread again and it seems this is a regression in lxml 1.0.1 here. I can't get lxml to put a Processing Instruction on top of the document, neither with tostring() nor with write() (providing an encoding of 'utf-8'). Even not when the original file had a <?xml ..> PI.
Andreas -- Your business will assume vast proportions.

Andreas Pakulat wrote:
Note that these are strictly speaking *not* real processing instructions but the XML prologue. Note also that if your XML document contains only ASCII or UTF-8 content, it's not required to put a prolog up at all. So, lxml is not broken in any way there. However, Stefan did add the ability to control the writing of the declaration using the xml_declaration keyword arguments: I.e., try the following: etree.tostring(t, 'utf-8', xml_declaration=True) I'm not sure whether we document this very well, though.. Regards, Martijn

On 19.06.06 12:48:37, Martijn Faassen wrote:
Hmm, right. However if I have a document with an existing xml declaration and read that into lxml I'd expect to get it out again. IMHO lxml shouldn't change the document implicitly.
That works, thanks.
I'm not sure whether we document this very well, though..
This is not documented at all, there's 1 example usage but in a more or less unrelated context (explaining the encoding parameter). Andreas -- Stay the curse.

Hi! I started using lxml some weeks ago, and have been lurking on the mailing list for some time now. Recently I had the problem that the xml prologue is not included by default, and stumbled over the following mail: On Mon, Jun 19, 2006 at 12:48:37PM +0200, Martijn Faassen wrote:
I.e., try the following:
etree.tostring(t, 'utf-8', xml_declaration=True)
Is there any reason that the method write_c14n() does not support this flag? The canonical form is a bit more readable, therefore I'd prefer to use this method. Best regards, Albert

Hi Albert, Albert Brandl wrote:
As the documentation of the write_c14n() method states, it always writes UTF-8 encoded byte streams, so there is no real need for the prologue. I wouldn't mind adding this, though. Things like 'standalone' and the XML version would otherwise not be available in the output. BTW, if it's about the readability, pretty printing might be closer to what you want anyway. Stefan

Hi Stefan, On Sat, Jul 01, 2006 at 02:53:03PM +0200, Stefan Behnel wrote:
I recently learned about section 4.1 of the C14N recommendation, http://www.w3.org/TR/xml-c14n#NoXMLDecl, which states that the canonical form does not contain a prologue. Therefore, write_c14n() is ok - sorry for the request.
BTW, if it's about the readability, pretty printing might be closer to what you want anyway.
Thanks for the hint. In lxml 1.0.1, the pretty printed version adds information about the namespace to every tag. Unfortunately, this decreases the readibility, since in my case, almost all tags have a namespace. A "pretty_print" flag for write_c14n() would be a perfect workaround, though :-) Best regards, Albert

On 19.06.06 12:14:02, Andreas Pakulat wrote:
I found the thread again and it seems this is a regression in lxml 1.0.1 here. I can't get lxml to put a Processing Instruction on top of the document, neither with tostring() nor with write() (providing an encoding of 'utf-8'). Even not when the original file had a <?xml ..> PI.
Andreas -- Your business will assume vast proportions.

Andreas Pakulat wrote:
Note that these are strictly speaking *not* real processing instructions but the XML prologue. Note also that if your XML document contains only ASCII or UTF-8 content, it's not required to put a prolog up at all. So, lxml is not broken in any way there. However, Stefan did add the ability to control the writing of the declaration using the xml_declaration keyword arguments: I.e., try the following: etree.tostring(t, 'utf-8', xml_declaration=True) I'm not sure whether we document this very well, though.. Regards, Martijn

On 19.06.06 12:48:37, Martijn Faassen wrote:
Hmm, right. However if I have a document with an existing xml declaration and read that into lxml I'd expect to get it out again. IMHO lxml shouldn't change the document implicitly.
That works, thanks.
I'm not sure whether we document this very well, though..
This is not documented at all, there's 1 example usage but in a more or less unrelated context (explaining the encoding parameter). Andreas -- Stay the curse.

Hi! I started using lxml some weeks ago, and have been lurking on the mailing list for some time now. Recently I had the problem that the xml prologue is not included by default, and stumbled over the following mail: On Mon, Jun 19, 2006 at 12:48:37PM +0200, Martijn Faassen wrote:
I.e., try the following:
etree.tostring(t, 'utf-8', xml_declaration=True)
Is there any reason that the method write_c14n() does not support this flag? The canonical form is a bit more readable, therefore I'd prefer to use this method. Best regards, Albert

Hi Albert, Albert Brandl wrote:
As the documentation of the write_c14n() method states, it always writes UTF-8 encoded byte streams, so there is no real need for the prologue. I wouldn't mind adding this, though. Things like 'standalone' and the XML version would otherwise not be available in the output. BTW, if it's about the readability, pretty printing might be closer to what you want anyway. Stefan

Hi Stefan, On Sat, Jul 01, 2006 at 02:53:03PM +0200, Stefan Behnel wrote:
I recently learned about section 4.1 of the C14N recommendation, http://www.w3.org/TR/xml-c14n#NoXMLDecl, which states that the canonical form does not contain a prologue. Therefore, write_c14n() is ok - sorry for the request.
BTW, if it's about the readability, pretty printing might be closer to what you want anyway.
Thanks for the hint. In lxml 1.0.1, the pretty printed version adds information about the namespace to every tag. Unfortunately, this decreases the readibility, since in my case, almost all tags have a namespace. A "pretty_print" flag for write_c14n() would be a perfect workaround, though :-) Best regards, Albert
participants (4)
-
Albert Brandl
-
Andreas Pakulat
-
Martijn Faassen
-
Stefan Behnel