xml.dom.minidom character encoding
Peter Otten
__peter__ at web.de
Wed Apr 21 14:55:22 EDT 2010
C. Benson Manica wrote:
> On Apr 21, 2:25 pm, Peter Otten <__pete... at web.de> wrote:
>
>> Are you sure that your script has
>>
>> str = u"..."
>>
>> like in your post and not just
>>
>> str = "..."
>
> No :-)
>
> str=u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem attrib=
> \"ó\"/></elements>"
> doc=xml.dom.minidom.parseString( str.encode("utf-8") )
> xml=doc.toxml( encoding="utf-8")
> file=codecs.open( "foo.xml", "w", "utf-8" )
> file.write( xml )
> file.close()
>
> fails:
>
> File "./demo.py", line 12, in <module>
> file.write( xml )
> File "/usr/lib/python2.5/codecs.py", line 638, in write
> return self.writer.write(data)
> File "/usr/lib/python2.5/codecs.py", line 303, in write
> data, consumed = self.encode(object, self.errors)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> 62: ordinal not in range(128)
But that's a different error (codecs.open().write()) on a different line.
What you said was failing (xml.dom.minidom.parseString()) worked.
> but dropping the encoding argument to doc.toxml() seems to finally
> work. I'd be curious to know why the code you posted (that worked for
> you) didn't for me, but at this point I'm just happy with something
> functional. Thank you very kindly!
The following worked for me an should work for you, too:
$ cat tmp.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import xml.dom.minidom
str = u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem
attrib=\"ó\"/></elements>"
doc = xml.dom.minidom.parseString(str.encode("utf-8"))
xml = doc.toxml(encoding="utf-8")
file = open("foo.xml", "w")
file.write( xml )
file.close()
$ python2.5 tmp.py
$ cat foo.xml
<?xml version="1.0" encoding="utf-8"?><elements><elem
attrib="ó"/></elements>$
Btw., str is a bad variable name because it shadows the builtin str type.
Peter
More information about the Python-list
mailing list