Fwd: [XML-SIG] xmlpickle.py ?!

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Wed, 9 Aug 2000 22:12:34 +0200

Tom wrote:
> >    output =3D string.replace(data, "]]>", "]]]><![CDATA[]>")
> Holy cow, /F!  But did you really mean
> output =3D string.replace(data, "]]>", "]]]><![CDATA[]]]>")

nope.  but I didn't make it clear that the idea was to put the
"output" string inside a CDATA section in the first place.

here's how it works:

1. the original "]]>" is split into two parts: "]" and "]>".

2. the "]" is put at the end of the first CDATA section, like this:

    "]" + "]]>"

3. the "]>" is put at the beginning of a second CDATA section,
like this:

    "<![CDATA[" + "]>"

the reason this trick works is that "]]>" is the *only* thing that's
recognized as markup in a CDATA section (see section 2.7 of the
XML spec):


    [18]  CDSect ::=3D  CDStart CData CDEnd=20
    [19]  CDStart ::=3D  '<![CDATA['=20
    [20]  CData ::=3D  (Char* - (Char* ']]>' Char*)) =20
    [21]  CDEnd ::=3D  ']]>'=20
    Within a CDATA section, only the CDEnd string is recognized
    as markup /.../


also note that

    /.../ CDATA sections cannot nest /.../

doesn't mean that you cannot put a CDStart tag inside another
CDATA section (e.g. if you're embedding XML in a CDATA section).
once the parser has started parsing the CDATA section, it will
simply skip over any embedded CDATA section -- but it will stop
at the first CDEnd tag it sees, unless you escape them as shown


one drawback here is that you may end up with more than one
CDATA segment at the receiving end, so a naive reader may mess
things up.  but if it does, it's broken.