When is an xml declaration required?

Hi all I am using lxml to serialise some fairly complex structures, store them offline in a database, and read them in at runtime when required, using etree.fromstring(). Up to now I have been hand-crafting the xml, and I assumed from what I had read that it had to start with an xml declaration, such as - <?xml version="1.0" encoding="UTF-8"?> I am now turning to generating the xml programatically, and I see that you have to specify a declaration if you want one. I tried it without creating one and it works just the same. Therefore I realise that I do not understand when a declaration is required, and when it is not. Please can someone explain it to me. Thanks Frank Millman

Frank Millman, 16.11.2013 11:47:
I am using lxml to serialise some fairly complex structures, store them offline in a database, and read them in at runtime when required, using etree.fromstring().
Up to now I have been hand-crafting the xml, and I assumed from what I had read that it had to start with an xml declaration, such as -
<?xml version="1.0" encoding="UTF-8"?>
I am now turning to generating the xml programatically, and I see that you have to specify a declaration if you want one. I tried it without creating one and it works just the same.
Therefore I realise that I do not understand when a declaration is required, and when it is not.
Please can someone explain it to me.
Sure. The XML declaration has three possible parameters: version, encoding and standalone. The only one that is required is the version, *iff* there is a declaration at all. All three parameters have default values as follows: version="1.0" encoding="UTF-8" standalone="no" Meaning, if there is no declaration, this is what you get. If there is one, you have to state the XML version first and can then pass none, one or both of the other two options. The XML spec allows one other case, though. If the input data starts with a BOM byte sequence (byte order mark), that BOM defines the encoding all by itself, so you can also have UTF-16 encoded XML without a declaration, for example, if you use the appropriate BOM instead. But that's a bit rare out there in the wild. You can find all the beautiful little details here: http://www.w3.org/TR/REC-xml/#sec-prolog-dtd Stefan

----- Original Message ----- From: "Stefan Behnel" <stefan_ml@behnel.de> To: <lxml@lxml.de> Sent: Saturday, November 16, 2013 3:45 PM Subject: Re: [lxml] When is an xml declaration required?
Frank Millman, 16.11.2013 11:47:
[...]
Therefore I realise that I do not understand when a declaration is required, and when it is not.
Please can someone explain it to me.
Sure. The XML declaration has three possible parameters: version, encoding and standalone. The only one that is required is the version, *iff* there is a declaration at all.
All three parameters have default values as follows:
version="1.0" encoding="UTF-8" standalone="no"
Meaning, if there is no declaration, this is what you get. If there is one, you have to state the XML version first and can then pass none, one or both of the other two options.
The XML spec allows one other case, though. If the input data starts with a BOM byte sequence (byte order mark), that BOM defines the encoding all by itself, so you can also have UTF-16 encoded XML without a declaration, for example, if you use the appropriate BOM instead. But that's a bit rare out there in the wild.
You can find all the beautiful little details here:
http://www.w3.org/TR/REC-xml/#sec-prolog-dtd
Stefan
Thanks so much for the detailed explanation, Stefan. I do not require anything other than the default, so I can just leave it out. Frank
participants (2)
-
Frank Millman
-
Stefan Behnel