Unicode strings -> xml.dom.minidom Text elements?

Alan Kennedy alanmk at hotmail.com
Mon Oct 21 19:52:09 CEST 2002

Patrick Surry wrote:
> I've got a unicode string like:
> a = u'ABC\u03A3DEF'
> and am stuffing it into an xml.dom.minidom Text() element.  But when I
> serialize the document with doc.writexml(), it turns into:
> <text>ABC?DEF</text>

Where are you writing the xml to? To a file? To a character terminal?

If you're writing to a file, then the following questions are also

 o What encoding are you using when writing the file?
 o Is that encoding correctly declared in the file?
 o How are you viewing the contents of the file? (e.g. browser, text
editor, etc)

If you are viewing it on a character terminal, what character set does
the terminal use? (On windows (for example), use the command "chcp" to
see the "code page" in use).

What is the default encoding of your python installation? Check this
with "import sys; sys.getdefaultencoding()"

I have my default python encoding set to "iso-8859-1", and observe the
following behaviour.

Python 2.2.1 (#34, Apr  9 2002, 19:34:33) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = u"ABC\u03A3DEF"
>>> s
>>> import sys
>>> sys.stdout.write(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: Latin-1 encoding error: ordinal not in range(256)

alan kennedy
check http headers here: http://xhaus.com/headers
email alan:              http://xhaus.com/mailto/alan

More information about the Python-list mailing list