accent letters in xml

Alessio Pace puccio_13 at yahoo.it
Fri May 2 05:42:28 EDT 2003


Hi, I wrote a config file for an application in xml format.
Some xml.dom.minidom.Text node must contain accents (they are real words)
and so I write them in the default way: à è and so on...
When I parse the xml file, python (2.3a2)  "skip" the character (which if
occurs in word, it occurs always at the end).
For example: 

<word>perchè</word>

python returns me the string:   u'perch'

The xml file is declared to be in UTF-8 and this is how I take the Text
elements data (suggestions for modfications are welcome):

set = sets.Set([])              # Set
wordsList = xmldoc.getElementsByTagName('word') 
        for element in wordsList:       # for each word element 
                children = element.childNodes   # the children
                for child in children:
                        if isinstance(child, minidom.Text):     
                                set.add(child.data)


Thanks.

-- 
bye
Alessio Pace




More information about the Python-list mailing list