[XML-SIG] Change in reporting of CDATA sections
Andrew Clover
and-xml at doxdesk.com
Sat May 22 18:23:00 EDT 2004
Uche Ogbuji <uche.ogbuji at fourthought.com> wrote:
> I disagree, and I use CDATA sections a lot. Try writing an article
> about XML *in* XML (e.g. XHTML). You might also become a fan :-)
I think that's the toolchain's job. In an ideal world there'd be an XML
editor that wasn't awful (!) but it's easy enough with a decent text
editor to write some XML, select it and encode/decode the offending
characters.
S'what I do, anyway. :-)
> As long as people understand that they're a simple lexical convenience,
> I'm not sure what their harm is.
You're right: at an XML-parsing level they're not too bad, but still
only a rather minor convenience. The problem is that they add complexity
without completely solving the problem - if you are writing an XML
article about CDATA sections, for example, you can't use a literal ']]>'!
> I'm not sure any level of DOM has a sane treatment of CDATA sections
I'm with you here, it's the DOM that's the real problem. Aside from
normalising text together being defeated by them, the issues with
splitting CDATA sections for ']]>' and out-of-encoding characters in
DOM3 are an extra annoyance and likely source of bugs for implementations.
The legacy nonsense from DTDs is a much worse issue in my book: it turns
XML from a simple, easy-to-grok-and-knock-up-a-noddy-parser-for notation
into a maze of twisty little bugs, all alike.
Manifesto for a cleaner XML more suited to simple tasks (ohmygod
Microsoft want to put XML in the DNS argh etc.):
- no doctypes
DTD validation is underpowered, ineffective for namespaces, and
does not deserve to be part of the basic required XML syntax.
Validation should be done as a layer on top of XML (Schema, RNG),
not as part of the basic required syntax.
- no entity references
most common use case: named character escapes: character references
are almost as convenient and anyway you should be using an encoding
that doesn't require you escape them. Further use case: inclusions:
use XInclude or similar processing layer on top of XML.
Entity references are not worth the *enormous* complexity they add
to the DOM (if implemented completely, anyway)
- no default attribute values
how hard is it for an application to take null (or '') for an
answer?
- no CDATA sections
at least at a DOM level
- no attribute normalisation
seems to be barely used, and confuses DOM a treat
- xmlns: declarations on the root element only, unique URIs
being able to reuse prefixes over the document for eg. inclusions
is not worth the pain of namespace fixup and broken interaction
between DOM1 and DOM2 methods
any I missed?
Been having a grim day tracking down obscure DOM bugs and interactions,
hope everyone is having a fun weekend. I'll stop ranting now then.
--
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/
More information about the XML-SIG
mailing list