[XML-SIG] converting iso8859-1 to UTF-8

Alexandre Fayolle alf@logilab.com
Tue, 26 Sep 2000 09:55:21 +0200 (CEST)


On Tue, 26 Sep 2000, Martin v. Loewis wrote:

> > I'm using python 1.5.2 and pyxml0.5.5.1, I'm encountering a encoding
> > problem with 4DOM. I can use
> > document.createTextNode(string) with a iso-8859-1 string. However when I
> > use xml.dom.ext.Print, everything fails, because the text node is not
> > UTF-8 encoded. How can I convert iso-8859-1 strings to UTF-8 ?
> 
> Can you give a small example of what exactly you did and what exactly
> happened?

I forgot to mention I was using 4Suite0.9.0. The problem may not occur
with older versions of 4DOM (I don't know which one is included in
pyxml). To know, just check if Print (in xml.dom.ext.__init__.py) takes an
'encoding' argument.

This is being sorted out on 4Suite's mailing list. Basically what I have
is something like:

from xml.dom.ext.reader import Sax2
from xml.dom.ext import Print
d= Sax2.FromXml('''<?xml version="1.0" encoding="iso-8859-1"?>
<document>élévation</document>''')
# The following works fine, because the parser converted my iso-8859-1
# characters to UTF-8
Print(d)
# to get a readable input, I have to use:
Print(d,encoding='iso-8859-1')

# Now if I create nodes manually, I have a problem:
c=d.createElementNS('','created-child')
d.documentElement.appendChild(c)
t=d.createTextNode("le mois d'août est chaud")
c.appendChild(t)
# The next statement prints both text node 'as is' : the first one is
# UTF-8 and the second one is iso-8859-1
Print(d)
# If I ask for a print as iso-8859-1, I get an error when the second text
# node is being converted. Full traceback available on request. 
Print(d,encoding='iso-8859-1')

What was concluded on 4Suite ML was that I needed python 2 to get 'native'
support for unicode, and that until the I would have to convert by
hand. :o(

-- 
Alexandre Fayolle
http://www.logilab.com - "Mais où est donc Ornicar ?" - 
LOGILAB, Paris (France).