Problem with minidom and special chars in HTML
Horst Gutmann
zerok at zerokspot.com
Tue Feb 22 11:20:42 EST 2005
Hi :-)
I currently have quite a big problem with minidom and special chars (for
example ü) in HTML.
Let's say I have following input file:
--------------------------------------------------
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<body>
ü
</body>
</html>
--------------------------------------------------
And following python script:
--------------------------------------------------
from xml.dom import minidom
if __name__ == '__main__':
doc = minidom.parse('test2.html')
f = open('test3.html','w+')
f.write(doc.toxml())
f.close()
--------------------------------------------------
test3.html only has a blank line where should be the ü It is simply
removed.
Any idea how I could solve this problem?
MfG, Horst
More information about the Python-list
mailing list