Replacement in unicodestrings?
KvS
keesvanschaik at gmail.com
Sat Oct 4 22:34:08 EDT 2008
Dear all,
could somebody please just put an end to the unicode mysery I'm in,
men... The situation is that I have a Tkinter program that let's the
user enter data in some Entries and this data needs to be transformed
to the encoding compatible with an .rtf-file. In fact I only need to
do some of the usual symbols like ë etc.
Here's the function that I am using:
def pythonUnicodeToRTFAscii(self,s):
if isinstance(s,str):
return s
s_str=repr(s.encode('UTF-8'))
replDic={'\xc3\xa0':"\\'e0",'\xc3\xa4':"\\'e4",'\xc3\xa1':"\
\'e1",
'\xc3\xa8':"\\'e8",'\xc3\xab':"\\'eb",'\xc3\xa9':"\
\'e9",
'\xc3\xb2':"\\'f2",'\xc3\xb6':"\\'f6",'\xc3\xb3':"\
\'f3",
'\xe2\x82\xac':"\\'80"}
for k in replDic.keys():
if repr(k) in s_str:
s_str=s_str.replace(repr(k),replDic[k])
return s_str
So replDic represents the mapping from one encoding to the other. Now,
if I enter e.g. 'Arjën' in the Entry, then s_str in the above function
becomes 'Arj\xc3\xabn' and since replDic contains the key \xc3\xab I
would expect the replacement in the final lines of the function to
kick in. This however doesn't happen, there's no match.
However interactive:
>>> '\xc3\xab' in 'Arj\xc3\xabn'
True
I just don't get it, what's the difference? Is the above anyhow the
best way to attack such a problem?
Thanks & best wishes, Kees
More information about the Python-list
mailing list