Unicode characters
Paul Johnston
paul.johnston at manchester.ac.uk
Mon Sep 4 09:39:36 EDT 2006
Hi
I have a string which I convert into a list then read through it
printing its glyph and numeric representation
#-*- coding: utf-8 -*-
thestring = "abcd"
thelist = list(thestring)
for c in thelist:
print c,
print ord(c)
Works fine for latin characters but when I put in a unicode character
a two byte character gives me two characters. For example an arabic
alef returns
* 216
* 167
( the first asterix is the empty set symbol the second a double "s")
Putting in sequential characters i.e. alef, beh, teh mabuta, gives me
sequential listings i.e.
216 167
216 168
216 169
So it is reading the correct details.
Is there anyway to get the c in the for loop to recognise it is
reading a multiple byte character.
I have followed the info in PEP 0263 and am using Python 2.4.3 Build
12 on a Windows box within Eclipse 3.2.0 and Python plugins 1.2.2
Cheers Paul
More information about the Python-list
mailing list