[Python-Dev] Encoding of code in XML

Paul Prescod paul@prescod.net
Wed, 19 Apr 2000 15:19:31 -0500

David Ascher wrote:
> ...
> If I understand what you're proposing, you're splitting a single bit of
> Python code into N XML elements.  

No, a CDATA section is not an element. But the question of whether
boundary placements are meaningful is sepearate. This comes back to the
"semantics question".

Most tools will not differentiate between two adjacent CDATA sections
and one. The XML specification does not say whether they should or
should not but in practice tools that consume XML and then throw it away
typically do NOT care about CDATA section boundaries and tools that edit
XML *do* care.

This "break it into to two sections" solution is the typical one but it
is god-awful ugly, even in XML editors.

Many stream-based XML tools (e.g. mostSAX parsers, xmllib) *will* report
two separate CDATA sections as two different character events.
Application code must be able to handle this situation. It doesn't only
occur with CDATA sections. XML parsers could equally break up long text
chunks based on 1024-byte block boundaries or line breaks or whatever
they feel like.

In my opinion these variances in behvior stem from the myth that XML has
no semantics but that's another off-topic topic.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Pop stars come and pop stars go, but amid all this change there is one
eternal truth: Whenever Bob Dylan writes a song about a guy, the guy is
guilty as sin.
	- http://www.nj.com/page1/ledger/e2efc7.html