[Tutor] rtf to xml regexp question
Paul Tremblay
phthenry@earthlink.net
Fri, 7 Jun 2002 17:10:07 -0400
I have found a java utility that does a pretty good job of
converting rtf to xml. I had played around with the idea of
writing a parser myself, but realized it was pretty coplicated.
However, this utility does not convert footnotes. Before I run
the utlity, I want to convert footnotes into this format:
As Schmoo claims <footnote>footnote text at bottom of page.
Quoted from <i>Title</i></footnote>
The present rtf format looks like this:
\pard\plain Now a footnote. As Schmoo claims {\fs18\up6 \chftn
{\footnote \pard\plain \s246 \fs20 {\fs18\up6 \chftn }footnote
at bottom of page.Quoted from {\i Title}}}\par
Majix, the rtf convertor will take care of a lot of this text,
converting the above to:
<p>Now a footnote. As Schmoo claims</p>
I am only inerested in the rtf text between {\footnote and }}}.
There are a few tricky parts, though. The text may break over
several lines. Also, if the actual title of the book does not end
the footnote reference,than the text I am interested in will end
in two }} rather than three.
The best method is to start a search that finds {\footnote. It
should add one to a footnote counter. Then the search should
continue from that point. If it finds another {, then it should
add another to the footnote counter. If it finds a }, then it
should subtract 1 from the footnote counter. It should stop
searchiing when it finds a } and when the footnote counter is
0.
I don't know how to do this in Python. I remember that in perl
you could start searching from where you left off.
Thanks!
Paul
--
************************
*Paul Tremblay *
*phthenry@earthlink.net*
************************