how to decode rtf characterset ?

MRAB python at mrabarnett.plus.com
Mon Feb 1 12:17:31 EST 2010


Stef Mientki wrote:
> hello,
> 
> I want to translate rtf files to unicode strings.
> I succeeded in remove all the tags,
> but now I'm stucked to the special accent characters,
> like :
> 
> "Vóór"
> 
> the character "ó" is represented by the string r"\'f3",
> or in bytes: 92, 39,102, 51
> 
> so I think I need a way to translate that into the string r"\xf3"
> but I can't find a way to accomplish that.
> 
> a
> Any suggestions are very welcome.
> 
Change r"\'f3" to r"\xf3" and then decode to Unicode:

 >>> s = r"\'f3"
 >>> s = s.replace(r"\'", r"\x").decode("unicode_escape")
 >>> print s
ó



More information about the Python-list mailing list