Iterating over unicode strings
Jason Orendorff
jason at jorendorff.com
Sun Mar 10 23:48:44 EST 2002
Arun Sharma wrote:
> I would like to iterate over the following unicode string one character
> at a time.
>
> line = u"ಡಾ|| ಶಿವರಾಮ ಕಾರಂತ"
> for c in line:
> print c
>
> fails miserably. What is the right way to do it ? I would also like to
> be able to slice the string i.e. line[i] to get the i'th character.
I don't have the fonts to view your string, unfortunately.
There are two possible problems in your sample code, I think.
Neither one has to do with slicing or indexing or "for c in line".
Both are caused by Python's ignorance.
1. Your unicode string might be wrong.
Python doesn't know the encoding of your program,
unfortunately, so it assumes ASCII. Anything that's
not ASCII causes an error. To fix this, specify the
encoding:
line = unicode("ಡಾ|| ಶಿವರಾಮ ಕಾರಂತ", 'utf-8')
2. print won't work. :(
Python doesn't know the encoding of your terminal,
unfortunately, so it assumes ASCII. Any output that's
not ASCII causes an error. To fix this, specify the
encoding:
print c.encode('utf-8')
Hope this helps.
## Jason Orendorff http://www.jorendorff.com/
More information about the Python-list
mailing list