Unicode problem

Fredrik Lundh fredrik at pythonware.com
Wed Mar 20 15:55:10 EST 2002


gargravarr at whoever.com wrote:
> I have a problem with Unicode strings, the error I get is:
> UnicodeError: ASCII encoding error: ordinal not in range(128)
>
> It is caused by
> body.write('<h2>%s</h2>\n' % i.firstChild.data)
>
> Where body is a StringIO object, and i is a xml.dom.minidom object.
> The characters it reacts to is the 3 Norwegian 'extra' ones: æ, ø, å
>  (if you can see them)

if you don't tell Python what 8-bit encoding you want
to use for a Unicode string, Python will assume ASCII.

to convert to an 8-bit string with a known encoding,
use the encode method:

    s = i.firstChild.data.encode("iso-8859-1")

you should probably escape < > & too; consider using
something like:

    from cgi import escape

    def encode(s):
        if not s: return ""
        return escape(s.encode("iso-8859-1"))

    body.write('<h2>%s</h2>\n' % encode(i.firstChild.data))

</F>

<!-- (the eff-bot guide to) the python standard library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->





More information about the Python-list mailing list