Is it possible to consume UTF8 XML documents using xml.dom.pulldom?

Peter Otten __peter__ at
Wed Jul 30 19:23:46 CEST 2008

Paul Boddie wrote:

> On 30 Jul, 18:17, Simon Willison <si... at> wrote:
>> Some very useful people in #python on Freenode pointed out that my bug
>> occurs because I'm trying to display things interactively in the
>> console. Saving to a variable instead fixes the problem.
> What's strange about that is how the object is represented when
> displayed:
> ('CHARACTERS', <DOM Text node "Simon\u2019s XM...">)
> Here, there's no attempt made to encode \u2019 as an ASCII byte
> sequence. Does the OS X version of Python do anything special with
> string representations?

I'm on Kubuntu 7.10 and see the same error as Simon. The problem is in the
minidom.CharacterData class which has the following method

    def __repr__(self):
        data =
        if len(data) > 10:
            dotdotdot = "..."
            dotdotdot = ""
        return "<DOM %s node \"%s%s\">" % (
            self.__class__.__name__, data[0:10], dotdotdot)

The data attribute is a unicode instance...


More information about the Python-list mailing list