REALLY simple xml reader

Stefan Behnel stefan_ml at
Thu Jan 31 18:35:17 CET 2008


Steven D'Aprano wrote:
> On Fri, 01 Feb 2008 00:40:01 +1100, Ben Finney wrote:
>> Quite apart from a human thinking it's pretty or not pretty, it's *not
>> valid XML* if the XML declaration isn't immediately at the start of the
>> document <URL:>. Many XML
>> parsers will (correctly) reject such a document.
> You know, I'd really like to know what the designers were thinking when 
> they made this decision.
[had a good laugh here]
> This is legal XML:
> """<?xml version="1.0"?>
> <greeting>Hello, world!</greeting>"""
> and so is this:
> """
>      <greeting       >Hello, world!</greeting    >"""
> but not this:
> """ <?xml version="1.0"?>
> <greeting>Hello, world!</greeting>"""

It's actually not that stupid. When you leave out the declaration, then the
XML is UTF-8 encoded (by spec), so normal ASCII whitespace doesn't matter.
It's just like the declaration had come *before* the whitespace, at the very
beginning of the byte stream.

But if you add a declaration, then the encoding can change for the whole
document (including the declaration!), so you have to give the parser a chance
to actually parse the declaration. How is it supposed to know that the
whitespace before the declaration *is* whitespace before it knows the encoding?


More information about the Python-list mailing list