[Python-Dev] Minidom and Unicode

M.-A. Lemburg mal@lemburg.com
Sat, 01 Jul 2000 14:02:55 +0200


"Martin v. Loewis" wrote:
> 
> While trying the minidom parser from the current CVS, I found that
> repr apparently does not work for nodes:
> 
> Python 2.0b1 (#29, Jun 30 2000, 10:48:11)  [GCC 2.95.2 19991024 (release)] on linux2
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> Copyright 1995-2000 Corporation for National Research Initiatives (CNRI)
> >>> from xml.dom.minidom import parse
> >>> d=parse("/usr/src/python/Doc/tools/sgmlconv/conversion.xml")
> >>> d.childNodes
> [Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: __repr__ returned non-string (type unicode)
> 
> The problem here is that __repr__ is computed as
> 
>     def __repr__( self ):
>         return "<DOM Element:"+self.tagName+" at "+`id( self )` +" >"
> 
> and that self.tagName is u'conversion', so the resulting string is a
> unicode string.
> 
> I'm not sure whose fault that is: either __repr__ should accept
> unicode strings, or minidom.Element.__repr__ should be changed to
> return a plain string, e.g. by converting tagname to UTF-8. In any
> case, I believe __repr__ should 'work' for these objects.

Note that __repr__ has to return a string object (and IIRC
this is checked in object.c or abstract.c). The correct way
to get there is to simply return str(...) or to have a
switch on the type of self.tagName and then call .encode().

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/