RTF Parsing

Victor Subervi victorsubervi at gmail.com
Tue Jul 29 15:08:21 CEST 2008


Hi;
I have this code:
def a():
chars = ['\\i0', '\\u0', '\\qc', '\\b0', '\\ql', '\\i', '\\u', '\\b',
'\\yz']
rtf_markup = 'viewkind4\uc1\pard\nowidctlpar\qc\i\f0\fs36 Who is like the
Beast? Who can wage war against him?\par'
for char in chars:
  c = '(?<=' + char + ')'
  test = re.search(c, rtf_markup)
  try:
    junk = test.group(0)
    print char
  except:
    pass
which gives this result:
>>> a()
\qc
\b0
\i
\u
\b
>>>
which makes no sense at all. I expected this:
>>> a()
\qc
\i
>>>
Why do I get more than that? Also, if I change this line thus:
c = '(?<=' + char + ')[0 ]'
I get this result:
>>> a()
\b
>>>
Why?
TIA,
Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080729/c8913586/attachment.html>


More information about the Python-list mailing list