[XML-SIG] problem with elementtree 1.2.6

Chris Withers chris at simplistix.co.uk
Wed Nov 28 22:46:04 CET 2007

Stefan Behnel wrote:
> Chris Withers wrote:
>>> the following entities are predefined: &amp; (&) &lt; (<) &gt; (>)
>>> &quot; (") &apos; ('). 
>> Okay, so in the above, if I really mean &lt;, the xml should be:
>> '<xml>&amp;lt;/&amp;gt;</xml>'
>> Seems a little clunky, but okay...
> That's how escaping works, be it in XML, encodings, compression, whatever.

Well yes and no. I'd expect escaping to work such that whatever we're 
dealing with can be round tripped, ie: parsed, serialiazed, parsed 
again, etc.

>> I guess this was causing me problems as I'm working on a bug in Twiddler 
>> (http://www.simplistix.co.uk/software/python/twiddler)
>> where quoted html was ending up unquoted after processing:
>>  >>> from twiddler import Twiddler
>>  >>> t = Twiddler('<span>&lt;b&gt;</span>')
>>  >>> t.render()
>> u'<span><b></span>'
> If render() is supposed to serialise a correct HTML or XML tag structure then
> this is a bug.

Indeed, although the bug turned out to be in the tree builder used as 
part of the parsing process.

>> Now, I see how you fixed this in ElementTree by re-escaping all the 
>> predefined entities (out of interest, why is the funtion called 
>> _escape_cdata rather than _escape_data?)
> You can read the SGML spec regarding CDATA.

Not sure what that's supposed to mean. CDATA for me means stuff inside a 
<![CDATA[ ]]> section. _escape_cdata is used for everything inside any 
tag that isn't another tag.

>> but I can't do that because I 
>> want uses to be able to insert chunks of html and choose whether or not 
>> they are escaped:
>>  >>> t = Twiddler('<span id="something"/>')
>> escaping:
>>  >>> t['something'].replace('<b>')
> What an odd API.

It actually works pretty well and might make more sense in context, have 
a look a the presentation on it:


>> no escaping:
>>  >>> t['something'].replace('<b>',filters=())
>>  >>> t.render()
>> u'<span id="something"><b></span>'
> I consider it bad practice to write serialised HTML into an HTML template. 

I and many others do not ;-) When writing content into an html template, 
that content often comes from other sources that spit out lumps of html. 
Being able to insert them without escaping is a common use case.

> It
> prevents the templating system from seeing the complete tag structure, which
> allows you to output broken HTML without noticing. 

That's true, sometimes. That inserted lump may have come from a process 
which can only spit out perfect html fragments, in which case you're 
fine, or it may come from user input, in which case you're doomed but 
will likely have happy customers ;-)

> Doesn't Twiddler provide a way to insert a tag tree fragment rather than a
> serialised tag string?

Yep, sure, that's what the clone method is for...

>> What extra hooks get called as a result of calling UseForeignDTD?
> Have you tried reading the docs or the source?

Docs yes, source no. I don't read C anymore :-(

Little help?



Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

More information about the XML-SIG mailing list