XML: minidom toxml() does not work for non English files! :-(

Jaros³aw Zabie³³o (delete .PL) webmaster at apologetyka.com.pl
Sat May 4 03:34:55 EDT 2002


I have a small code:

from xml.dom import minidom  
xmldoc = minidom.parse('myfile.xml')
print xmldoc.toxml() 

It works for 7-bit text fine. But the problem is it works ONLY for
pure ASCII text. :-( If I try to use any of non English characters,
Python raise an exception:

  UnicodeError: ASCII encoding error: ordinal not in range(128)

It does NOT work even on utf-8 xml files with any character outside
7-bit ASCII character set. It is strange, because utf-8 should be
correctly parsed by all xml tools.

Is it mean toxml() or toprettyxml() methods of minidom are useless for
non English strings? I need them to cut one big xml file into smaller
pieces and write them into several files.

-- 
Jarosław Zabiełło (UIN: 6712522)
URL: http://www.pik-net.pl/~zbiru



More information about the Python-list mailing list