Iterating over unicode strings

Jason Orendorff jason at jorendorff.com
Sun Mar 10 23:48:44 EST 2002


Arun Sharma wrote:
> I would like to iterate over the following unicode string one character 
> at a time.
> 
> line = u"ಡಾ|| ಶಿವರಾಮ ಕಾರಂತ"
> for c in line:
>      print c
> 
> fails miserably. What is the right way to do it ? I would also like to 
> be able to slice the string i.e. line[i] to get the i'th character.

I don't have the fonts to view your string, unfortunately.

There are two possible problems in your sample code, I think.
Neither one has to do with slicing or indexing or "for c in line".
Both are caused by Python's ignorance.

   1.  Your unicode string might be wrong.

       Python doesn't know the encoding of your program,
       unfortunately, so it assumes ASCII.  Anything that's
       not ASCII causes an error.  To fix this, specify the
       encoding:

       line = unicode("ಡಾ|| ಶಿವರಾಮ ಕಾರಂತ", 'utf-8')

   2.  print won't work.  :(

       Python doesn't know the encoding of your terminal,
       unfortunately, so it assumes ASCII.  Any output that's
       not ASCII causes an error.  To fix this, specify the
       encoding:

       print c.encode('utf-8')

Hope this helps.

## Jason Orendorff    http://www.jorendorff.com/





More information about the Python-list mailing list