Replacement in unicodestrings?

KvS keesvanschaik at gmail.com
Sat Oct 4 22:34:08 EDT 2008


Dear all,

could somebody please just put an end to the unicode mysery I'm in,
men... The situation is that I have a Tkinter program that let's the
user enter data in some Entries and this data needs to be transformed
to the encoding compatible with an .rtf-file. In fact I only need to
do some of the usual symbols like ë etc.

Here's the function that I am using:

    def pythonUnicodeToRTFAscii(self,s):
        if isinstance(s,str):
            return s
        s_str=repr(s.encode('UTF-8'))
        replDic={'\xc3\xa0':"\\'e0",'\xc3\xa4':"\\'e4",'\xc3\xa1':"\
\'e1",
                '\xc3\xa8':"\\'e8",'\xc3\xab':"\\'eb",'\xc3\xa9':"\
\'e9",
                '\xc3\xb2':"\\'f2",'\xc3\xb6':"\\'f6",'\xc3\xb3':"\
\'f3",
                '\xe2\x82\xac':"\\'80"}
        for k in replDic.keys():
            if repr(k) in s_str:
                s_str=s_str.replace(repr(k),replDic[k])
        return s_str

So replDic represents the mapping from one encoding to the other. Now,
if I enter e.g. 'Arjën' in the Entry, then s_str in the above function
becomes 'Arj\xc3\xabn' and since replDic contains the key \xc3\xab I
would expect the replacement in the final lines of the function to
kick in. This however doesn't happen, there's no match.

However interactive:

>>> '\xc3\xab' in 'Arj\xc3\xabn'
True

I just don't get it, what's the difference? Is the above anyhow the
best way to attack such a problem?

Thanks & best wishes, Kees



More information about the Python-list mailing list