RE - non-greedy - some greediness - complete greediness

Doru-Catalin Togea doru-cat at ifi.uio.no
Sun Nov 10 03:57:28 EST 2002


Hi!

I am working on a little project where i need REs with a self-defined
level of greediness.

I am processing text in a Latex like manner, for those of you who are
familiar with it.

I want to be able to recognize Latex-like commands within my text. A
command starts with a character, say '\' followed by the command's name
and followed by the text on which the command applies enclosed in curly
brackets. Examples:

	\footnote{some text}
	\cite{some referance}
	\section{some title}

Recognizing such patterns and retriving the name of the command and the
text on which the command applies is not so difficult to achieve. Things
get complicated though when one encounters nested commands.

Say I have the following string:

 "abcd \footnote{efgh \cite{myRef1} ijkl} mnop \footnote{hello there}"
                                  ^     ^                           ^
 closing bracket of nested \cite  |     |                           |
    closing bracket of first \footnote  |                           |
                               closing bracket of second \footnote  |

By using non-greedy REs, I would get recognized the following footnotes:
	1) \footnote{efgh \cite{myRef1}
	2) \footnote{hello there}

The first matching is wrong because the first '}' encountered belongs to
the nested \cite command. The second matching is correct.

By using greedy REs, I would get recognized the following pattern:
	1) \footnote{efgh \cite{myRef1} ijkl} mnop \footnote{hello there}

This is wrong because a) there are two \footnote commands in my string, so
I should get two matchings, and b) the first \footnote command should be
applied only to the "efgh \cite{myRef1} ijkl" substring.

In other words I need to be able to specify the level of greediness my REs
should have. Is it possible? If yes, how?

I have an idea of how to solve my problem without REs, by counting the
opening and closing curly brackets and reasoning algorithmically about
where within my text each nested command begins and ends. The question is
can the above stated problem be solved elegantly by means of REs?

With best regards,
Catalin


	<<<< ================================== >>>>
	<<     We are what we repeatedly do.      >>
	<<  Excellence, therefore, is not an act  >>
	<<             but a habit.               >>
	<<<< ================================== >>>>





More information about the Python-list mailing list