how to decode rtf characterset ?
MRAB
python at mrabarnett.plus.com
Mon Feb 1 12:17:31 EST 2010
Stef Mientki wrote:
> hello,
>
> I want to translate rtf files to unicode strings.
> I succeeded in remove all the tags,
> but now I'm stucked to the special accent characters,
> like :
>
> "Vóór"
>
> the character "ó" is represented by the string r"\'f3",
> or in bytes: 92, 39,102, 51
>
> so I think I need a way to translate that into the string r"\xf3"
> but I can't find a way to accomplish that.
>
> a
> Any suggestions are very welcome.
>
Change r"\'f3" to r"\xf3" and then decode to Unicode:
>>> s = r"\'f3"
>>> s = s.replace(r"\'", r"\x").decode("unicode_escape")
>>> print s
ó
More information about the Python-list
mailing list