REALLY simple xml reader

Stefan Behnel stefan_ml at behnel.de
Thu Jan 31 12:35:17 EST 2008


Hi,

Steven D'Aprano wrote:
> On Fri, 01 Feb 2008 00:40:01 +1100, Ben Finney wrote:
> 
>> Quite apart from a human thinking it's pretty or not pretty, it's *not
>> valid XML* if the XML declaration isn't immediately at the start of the
>> document <URL:http://www.w3.org/TR/xml/#sec-prolog-dtd>. Many XML
>> parsers will (correctly) reject such a document.
> 
> You know, I'd really like to know what the designers were thinking when 
> they made this decision.
[had a good laugh here]
> This is legal XML:
> 
> """<?xml version="1.0"?>
> <greeting>Hello, world!</greeting>"""
> 
> and so is this:
> 
> """
>      <greeting       >Hello, world!</greeting    >"""
> 
> 
> but not this:
> 
> """ <?xml version="1.0"?>
> <greeting>Hello, world!</greeting>"""

It's actually not that stupid. When you leave out the declaration, then the
XML is UTF-8 encoded (by spec), so normal ASCII whitespace doesn't matter.
It's just like the declaration had come *before* the whitespace, at the very
beginning of the byte stream.

But if you add a declaration, then the encoding can change for the whole
document (including the declaration!), so you have to give the parser a chance
to actually parse the declaration. How is it supposed to know that the
whitespace before the declaration *is* whitespace before it knows the encoding?

Stefan



More information about the Python-list mailing list