[New-bugs-announce] [issue1290] xml.dom.minidom not able to handle utf-8 data

Sharmila Sivakumar report at bugs.python.org
Thu Oct 18 03:58:15 CEST 2007


New submission from Sharmila Sivakumar:

I try to load the data in the testdata.txt file into a dom.

I tried 
import xml.dom.minidom as dom
data = open('testdata.txt','r').read()
mydom = dom.parseString(data)
I get the following error

>>> mydom.firstChild.childNodes
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in 
position 18: ordinal not in range(128)


So I tried decoding the data and using it but it failed again.

>>> mydom2 = dom.parseString(data.decode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/_xmlplus/dom/minidom.py", line 
1925, in parseString
    return expatbuilder.parseString(string)
  File "/usr/lib/python2.5/site-packages/_xmlplus/dom/expatbuilder.py", 
line 942, in parseString
    return builder.parseString(string)
  File "/usr/lib/python2.5/site-packages/_xmlplus/dom/expatbuilder.py", 
line 223, in parseString
    parser.Parse(string, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u014d' in 
position 173: ordinal not in range(128)


I am willing to fix this myself if I'm given the permission.

----------
components: Interpreter Core, Unicode, XML
files: testdata.txt
messages: 56511
nosy: sharmila
severity: normal
status: open
title: xml.dom.minidom not able to handle utf-8 data
type: compile error
versions: Python 2.5
Added file: http://bugs.python.org/file8558/testdata.txt

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1290>
__________________________________
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: testdata.txt
Url: http://mail.python.org/pipermail/new-bugs-announce/attachments/20071018/dc3ca282/attachment.txt 


More information about the New-bugs-announce mailing list