[Tutor] rtf to xml regexp question

Paul Tremblay phthenry@earthlink.net
Fri, 7 Jun 2002 17:10:07 -0400


I have found a java utility that does a pretty good job of
converting rtf to xml. I had played around with the idea of
writing a parser myself, but realized it was pretty coplicated.

However, this utility does not convert footnotes. Before I run
the utlity, I want to convert footnotes into this format:

As Schmoo claims <footnote>footnote text at bottom of page.
Quoted from <i>Title</i></footnote>

The present rtf format looks like this:

\pard\plain Now a footnote. As Schmoo claims {\fs18\up6 \chftn
{\footnote \pard\plain \s246 \fs20  {\fs18\up6 \chftn }footnote
at bottom of page.Quoted from {\i Title}}}\par 

Majix, the rtf convertor will take care of a lot of this text,
converting the above to:

<p>Now a footnote. As Schmoo claims</p>

I am only inerested in the rtf text between {\footnote and }}}.

There are a few tricky parts, though. The text may break over
several lines. Also, if the actual title of the book does not end
the footnote reference,than the text I am interested in will end
in two }} rather than three.

The best method is to start a search that finds {\footnote. It
should add one to a footnote counter. Then the search should
continue from that point. If it finds another {, then it should
add another to the footnote counter. If it finds a }, then it
should subtract 1 from the footnote counter. It should stop
searchiing when it finds a } and when the footnote counter is
0. 

I don't know how to do this in Python. I remember that in perl
you could start searching from where you left off. 

Thanks!

Paul

-- 

************************
*Paul Tremblay         *
*phthenry@earthlink.net*
************************