how to decode rtf characterset ?
M.-A. Lemburg
mal at egenix.com
Mon Feb 1 13:11:24 EST 2010
Stef Mientki wrote:
> hello,
>
> I want to translate rtf files to unicode strings.
> I succeeded in remove all the tags,
> but now I'm stucked to the special accent characters,
> like :
>
> "Vóór"
>
> the character "ó" is represented by the string r"\'f3",
> or in bytes: 92, 39,102, 51
> so I think I need a way to translate that into the string r"\xf3"
> but I can't find a way to accomplish that.
>
> a
> Any suggestions are very welcome.
You could try something along these lines:
>>> s = r"\'f3"
>>> s = s.replace("\\'", "\\x")
>>> u = s.decode('unicode-escape')
>>> u
u'\xf3'
However, this assumes Latin-1 codes being using by the RTF
text.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Feb 01 2010)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-list
mailing list