how can I convert invalid ASCII string to Unicode?

skip at skip at
Tue May 8 23:29:20 EDT 2001

I have been blissfully ignoring Unicode.  Alas, my bliss has been so rudely

Suppose I have this string:

    s = "ö"	    # "o" with an umlaut

and I'd like to convert it to UTF-8.  (I know I can preface string literals
with 'u', but that's not an option here.  Pretend s was assigned from a file

Simply executing

    u = unicode(s)

fails because ord(s) is > 127.  I eventually figured out that the following
would work:

    u = "".join([unichr(ord(c)) for c in s])

but this seems a bit obscure.  Is there a cleaner way to convert plain
strings containing characters > 127 to UTF-8?  Ideally I guess I'd like
plain strings to be interpreted as Latin-1 instead of ASCII by default, even
though my locale is 'murican.


Skip Montanaro (skip at

More information about the Python-list mailing list