Replacement in unicodestrings?

"Martin v. Löwis" martin at
Sun Oct 5 07:31:22 CEST 2008

>         s_str=repr(s.encode('UTF-8'))

It would be easier to encode this in cp1252 here, as this is apparently
the encoding that you want to use in the RTF file, too. You could then
loop over the string, replacing all bytes >= 128 with \\'%.2x

As yet another alternative, you could create a Unicode error handler
(call it 'rtf'), and then do

          return s.encode('ascii', errors='rtf')

>         replDic={'\xc3\xa0':"\\'e0",'\xc3\xa4':"\\'e4",'\xc3\xa1':"\
> \'e1",
>                 '\xc3\xa8':"\\'e8",'\xc3\xab':"\\'eb",'\xc3\xa9':"\
> \'e9",
>                 '\xc3\xb2':"\\'f2",'\xc3\xb6':"\\'f6",'\xc3\xb3':"\
> \'f3",
>                 '\xe2\x82\xac':"\\'80"}
>         for k in replDic.keys():
>             if repr(k) in s_str:
>                 s_str=s_str.replace(repr(k),replDic[k])
>         return s_str
> However interactive:
>>>> '\xc3\xab' in 'Arj\xc3\xabn'
> True
> I just don't get it, what's the difference?

It's the repr():

py> '\xc3\xab' in 'Arj\xc3\xabn'
py> repr('\xc3\xab') in repr('Arj\xc3\xabn')
py> repr('\xc3\xab')
py> repr('Arj\xc3\xabn')

repr('\xc3\xab') starts with an apostrophe, which doesn't
appear before the \\xc3 in repr('Arj\xc3\xabn').


More information about the Python-list mailing list