REALLY simple xml reader
Stefan Behnel
stefan_ml at behnel.de
Thu Jan 31 12:35:17 EST 2008
Hi,
Steven D'Aprano wrote:
> On Fri, 01 Feb 2008 00:40:01 +1100, Ben Finney wrote:
>
>> Quite apart from a human thinking it's pretty or not pretty, it's *not
>> valid XML* if the XML declaration isn't immediately at the start of the
>> document <URL:http://www.w3.org/TR/xml/#sec-prolog-dtd>. Many XML
>> parsers will (correctly) reject such a document.
>
> You know, I'd really like to know what the designers were thinking when
> they made this decision.
[had a good laugh here]
> This is legal XML:
>
> """<?xml version="1.0"?>
> <greeting>Hello, world!</greeting>"""
>
> and so is this:
>
> """
> <greeting >Hello, world!</greeting >"""
>
>
> but not this:
>
> """ <?xml version="1.0"?>
> <greeting>Hello, world!</greeting>"""
It's actually not that stupid. When you leave out the declaration, then the
XML is UTF-8 encoded (by spec), so normal ASCII whitespace doesn't matter.
It's just like the declaration had come *before* the whitespace, at the very
beginning of the byte stream.
But if you add a declaration, then the encoding can change for the whole
document (including the declaration!), so you have to give the parser a chance
to actually parse the declaration. How is it supposed to know that the
whitespace before the declaration *is* whitespace before it knows the encoding?
Stefan
More information about the Python-list
mailing list