[Tutor] Regular Expression question

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Mon Apr 21 17:26:51 2003


On Sun, 20 Apr 2003, Michael Janssen wrote:

> > It appears that the aggregation functions of parentheses are limited.
> > This does not work:
> >
> > <html([ \t]*>)|([ \t:].+?>)
>
> Seems to be a point, where re gets very sophisticated ;-)
>
> You should try to explain in plain words what this re is expected to do
> (and each part of it). This often helps to find logical mistakes
> (Pointer: "ab|c" doesn't look for ab or ac but for ab or c).


And if it helps, think of consecutive characters as multiplication, and
the '|' symbol as addition.  That is,


   regex             math

---------------------------------

   ab|c     <==>    ab + c



Your algebraic training should kick in at this point.  *grin*


   ab|ac    <==>    ab + ac

   a(b|c)   <==>    a(b + c)

            <==>    ab + ac


This is what we are trying to say when we mention "precedence".  In
mathematical expressions, multiplication "binds" more tightly than
addition.  In regular expressions, concatenation (adjacent characters)
"binds" more tightly than alternation (the '|' operator).  In both cases,
we can use parentheses to override the default way that that things bind
together.