[issue6233] ElementTree (py3k) doesn't properly encode characters that can't be represented in the specified encoding

Sun Jun 21 23:42:04 CEST 2009

Fredrik Lundh <fredrik at effbot.org> added the comment:

Did you look at the 1.3 alpha code base when you came up with this idea?  
Unfortunately, 1.3's _encode is used for a different purpose...

I don't have time to test it tonight, but I suspect that 1.3's 
escape_data/escape_attrib functions might work better under 3.X; they do 
the text.replace dance first, and then an explicit text.encode(encoding, 
"xmlcharrefreplace") at the end.  E.g.

def _escape_cdata(text, encoding):
    # escape character data
    try:
        # it's worth avoiding do-nothing calls for strings that are
        # shorter than 500 character, or so.  assume that's, by far,
        # the most common case in most applications.
        if "&" in text:
            text = text.replace("&", "&amp;")
        if "<" in text:
            text = text.replace("<", "&lt;")
        if ">" in text:
            text = text.replace(">", "&gt;")
        return text.encode(encoding, "xmlcharrefreplace")
    except (TypeError, AttributeError):
        _raise_serialization_error(text)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6233>
_______________________________________