how to decode rtf characterset ?
mal at egenix.com
Mon Feb 1 19:11:24 CET 2010
Stef Mientki wrote:
> I want to translate rtf files to unicode strings.
> I succeeded in remove all the tags,
> but now I'm stucked to the special accent characters,
> like :
> the character "ó" is represented by the string r"\'f3",
> or in bytes: 92, 39,102, 51
> so I think I need a way to translate that into the string r"\xf3"
> but I can't find a way to accomplish that.
> Any suggestions are very welcome.
You could try something along these lines:
>>> s = r"\'f3"
>>> s = s.replace("\\'", "\\x")
>>> u = s.decode('unicode-escape')
However, this assumes Latin-1 codes being using by the RTF
Professional Python Services directly from the Source (#1, Feb 01 2010)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
More information about the Python-list