Iterating over unicode strings

Mon Mar 11 00:32:19 EST 2002

Arun Sharma <arun-public at sharma-home.net> writes:

> line = u"à²¡à²¾|| à²¶à²¿à²µà²°à²¾à²® à²²¾ÂàÒ°à²àÒ¤"
> for c in line:
>      print c
> 
> fails miserably. What is the right way to do it ? I would also like to
> be able to slice the string i.e. line[i] to get the i'th character.

I'm not sure what you expect to happen, but I believe your program
works "correctly": it prints one character at a time.

Now, the question is: what did you want to happen? Apparently, you
want to use UTF-8 in your string literal. This is currently not
directly supported - Unicode literals are Latin-1 encoded. Instead, use

line = unicode("your text", "utf-8")

HTH,
Martin