RE - non-greedy - some greediness - complete greediness

Sun Nov 10 07:33:58 EST 2002

On Sun, 10 Nov 2002 09:57:28 +0100 (MET), Doru-Catalin Togea
<doru-cat at ifi.uio.no> wrote:

>Hi!
>
>I am working on a little project where i need REs with a self-defined
>level of greediness.
>
>I am processing text in a Latex like manner, for those of you who are
>familiar with it.
>
>I want to be able to recognize Latex-like commands within my text. A
>command starts with a character, say '\' followed by the command's name
>and followed by the text on which the command applies enclosed in curly
>brackets. Examples:
>
>	\footnote{some text}
>	\cite{some referance}
>	\section{some title}
>
>Recognizing such patterns and retriving the name of the command and the
>text on which the command applies is not so difficult to achieve. Things
>get complicated though when one encounters nested commands.
>
>Say I have the following string:
>
> "abcd \footnote{efgh \cite{myRef1} ijkl} mnop \footnote{hello there}"
>                                  ^     ^                           ^
> closing bracket of nested \cite  |     |                           |
>    closing bracket of first \footnote  |                           |
>                               closing bracket of second \footnote  |
>
>By using non-greedy REs, I would get recognized the following footnotes:
>	1) \footnote{efgh \cite{myRef1}
>	2) \footnote{hello there}
>
>The first matching is wrong because the first '}' encountered belongs to
>the nested \cite command. The second matching is correct.
>
>By using greedy REs, I would get recognized the following pattern:
>	1) \footnote{efgh \cite{myRef1} ijkl} mnop \footnote{hello there}
>
>This is wrong because a) there are two \footnote commands in my string, so
>I should get two matchings, and b) the first \footnote command should be
>applied only to the "efgh \cite{myRef1} ijkl" substring.
>
>In other words I need to be able to specify the level of greediness my REs
>should have. Is it possible? If yes, how?
>
>I have an idea of how to solve my problem without REs, by counting the
>opening and closing curly brackets and reasoning algorithmically about
>where within my text each nested command begins and ends. The question is
>can the above stated problem be solved elegantly by means of REs?
>
>With best regards,
>Catalin
>
>
>	<<<< ================================== >>>>
>	<<     We are what we repeatedly do.      >>
>	<<  Excellence, therefore, is not an act  >>
>	<<             but a habit.               >>
>	<<<< ================================== >>>>
>

Quoting from the online chapter of Freidel's mastering regular
expressions:

"In fact, the problem is that you simply can't match arbitrarily nested
constructs with regular expressions. It just can't be done."

HTH,
G. Rodrigues