RE - non-greedy - some greediness - complete greediness
Gonçalo Rodrigues
op73418 at mail.telepac.pt
Sun Nov 10 07:33:58 EST 2002
On Sun, 10 Nov 2002 09:57:28 +0100 (MET), Doru-Catalin Togea
<doru-cat at ifi.uio.no> wrote:
>Hi!
>
>I am working on a little project where i need REs with a self-defined
>level of greediness.
>
>I am processing text in a Latex like manner, for those of you who are
>familiar with it.
>
>I want to be able to recognize Latex-like commands within my text. A
>command starts with a character, say '\' followed by the command's name
>and followed by the text on which the command applies enclosed in curly
>brackets. Examples:
>
> \footnote{some text}
> \cite{some referance}
> \section{some title}
>
>Recognizing such patterns and retriving the name of the command and the
>text on which the command applies is not so difficult to achieve. Things
>get complicated though when one encounters nested commands.
>
>Say I have the following string:
>
> "abcd \footnote{efgh \cite{myRef1} ijkl} mnop \footnote{hello there}"
> ^ ^ ^
> closing bracket of nested \cite | | |
> closing bracket of first \footnote | |
> closing bracket of second \footnote |
>
>By using non-greedy REs, I would get recognized the following footnotes:
> 1) \footnote{efgh \cite{myRef1}
> 2) \footnote{hello there}
>
>The first matching is wrong because the first '}' encountered belongs to
>the nested \cite command. The second matching is correct.
>
>By using greedy REs, I would get recognized the following pattern:
> 1) \footnote{efgh \cite{myRef1} ijkl} mnop \footnote{hello there}
>
>This is wrong because a) there are two \footnote commands in my string, so
>I should get two matchings, and b) the first \footnote command should be
>applied only to the "efgh \cite{myRef1} ijkl" substring.
>
>In other words I need to be able to specify the level of greediness my REs
>should have. Is it possible? If yes, how?
>
>I have an idea of how to solve my problem without REs, by counting the
>opening and closing curly brackets and reasoning algorithmically about
>where within my text each nested command begins and ends. The question is
>can the above stated problem be solved elegantly by means of REs?
>
>With best regards,
>Catalin
>
>
> <<<< ================================== >>>>
> << We are what we repeatedly do. >>
> << Excellence, therefore, is not an act >>
> << but a habit. >>
> <<<< ================================== >>>>
>
Quoting from the online chapter of Freidel's mastering regular
expressions:
"In fact, the problem is that you simply can't match arbitrarily nested
constructs with regular expressions. It just can't be done."
HTH,
G. Rodrigues
More information about the Python-list
mailing list