Is it possible to consume UTF8 XML documents using xml.dom.pulldom?
__peter__ at web.de
Wed Jul 30 19:23:46 CEST 2008
Paul Boddie wrote:
> On 30 Jul, 18:17, Simon Willison <si... at simonwillison.net> wrote:
>> Some very useful people in #python on Freenode pointed out that my bug
>> occurs because I'm trying to display things interactively in the
>> console. Saving to a variable instead fixes the problem.
> What's strange about that is how the object is represented when
> ('CHARACTERS', <DOM Text node "Simon\u2019s XM...">)
> Here, there's no attempt made to encode \u2019 as an ASCII byte
> sequence. Does the OS X version of Python do anything special with
> string representations?
I'm on Kubuntu 7.10 and see the same error as Simon. The problem is in the
minidom.CharacterData class which has the following method
data = self.data
if len(data) > 10:
dotdotdot = "..."
dotdotdot = ""
return "<DOM %s node \"%s%s\">" % (
self.__class__.__name__, data[0:10], dotdotdot)
The data attribute is a unicode instance...
More information about the Python-list