need some regular expression help

Diez B. Roggisch deets at web.de
Sun Oct 8 04:49:50 EDT 2006


hanumizzle wrote:
> On 7 Oct 2006 15:00:29 -0700, Diez B. Roggisch <deets at web.de> wrote:
> >
> > Chris wrote:
> > > I need a pattern that  matches a string that has the same number of '('
> > > as ')':
> > > findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
> > > '((2x+2)sin(x))', '(log(2)/log(5))' ]
> > > Can anybody help me out?
> >
> > This is not possible with regular expressions - they can't "remember"
> > how many parens they already encountered.
>
> Remember that regular expressions are used to represent regular
> grammars. Most regex engines actually aren't regular in that they
> support fancy things like look-behind/ahead and capture groups...IIRC,
> these cannot be part of a true regular expression library.

Certainly true, and it always gives me a hard time because I don't know
to which extend a regular expression nowadays might do the job because
of these extensions. It was so much easier back in the old times....

> With that said, the quote-unquote regexes in Lua have a special
> feature that supports balanced expressions. I believe Python has a
> PCRE lib somewhere; you may be able to use the experimental ??{ }
> construct in that case.

Even if it has - I'm not sure if it really does you good, for several
reasons:

 - regexes - even enhanced ones - don't build trees. But that is what
you ultimately want
   from an expression like sin(log(x))

 - even if they are more powerful these days, the theory of context
free grammars still applies.
   so if what you need isn't LL(k) but LR(k), how do you specify that
to the regex engine?

 - the regexes are useful because of their compact notations, parsers
allow for better structured outcome 


Diez




More information about the Python-list mailing list