BUG? Python 2.0 chokes on international characters in Unicode strings

Wed Jan 31 08:35:24 EST 2001

This may explain why I was having problems when I extracted data drom
an international database. onr record contained o-accent, u-umlaut,
and e-umlaut. Python crashed every time I tried to perform some string
operation on a string containing these characters or if I tried to
print those strings. I think the error was a Unicode coersion error or
something like that. We may be loking at an previously unreported bug.

Dave

"Jurie Horneman" <jhorSPAMBGONneman at wanadoo.fr> wrote:

>I've checked the FAQ and the buglist, but I couldn't find anything on this.
>
>I was doing some COM programming with Python 2.0 on Windows 2000 yesterday.
>I retrieved some Unicode strings containing international characters: e
>accent grave, c accent circonflex, that kind of thing (I live in France).
>Printing these strings raised a Unicode exception. Some investigation showed
>that the exception was raised when trying to print the international
>characters. Replacing them by some dummy character "solved" the problem: the
>strings printed fine.
>
>It's odd: I've seen French COM error messages printed by Python. The
>international characters were not printed correctly ('u/351'), but at least
>it didn't crash...
>
>I haven't verified if the same thing happens with ASCII strings.
>
>Is this a known bug? (If so, AAARRGGHHH - could it really be that Python
>basically doesn't work outside of the US? Hard to believe: these characters
>are used in the Dutch language...)
>Is there some workaround? Could I convert Unicode strings to ASCII? If so,
>how?
>
>Thanks for any help on this. Please put my work address
>jhorneman at kalisto.com in copy if you can, I'm writing this from home.