Regular Expressions in Python

Mon Mar 1 14:04:36 EST 2004

"fossil_blue" <noelrivera at excite.com> wrote in message
news:c7c33240.0403010618.12391181 at posting.google.com...
> Dear Gurus,
>
>   I am trying to find out how to write an effective regular expression
> in python for the following scenario:
>
>    "any number of leading spaces at the beginning of a line" "follow
> by a string" "there maybe a string that starts with *"
>
> for example:
>
>   END  *This is a comment
>
> but I don't want to match this:
>
>    END  e * This is a line with an error (e)
>
> thanks,
> Noel

Here's an example with sample code using both re's and pyparsing.  Note that
the single .ignore() call takes care of ignoring comments on all contained
grammar constructs, and non-significant whitespace is implicitly ignored (so
no need to litter your matching expressions with lots of opt_spaces-type
content).

-- Paul
========================
from pyparsing import Word, alphas, alphanums, restOfLine, LineEnd,
ParseException

testdata = """
END  *This is a comment
  END*This is a comment (but the next line has no comment)
  END
   END  e * This is a line with an error (e)"""
enquote = lambda st : ( '"%s"' % st )

print "test with pyparsing"
grammar = Word( alphas, alphanums ).setName("keyword") + LineEnd()
comment = "*" + restOfLine
grammar.ignore( comment )

for test in testdata.split("\n"):
    try:
        print enquote(test),"\n->",
        print grammar.parseString( test )
    except ParseException, pe:
        print pe

print

import re
print "test with re"
opt_spaces  = " *"
#identifier  = "[A-Za-z_][A-Za-z0-9_]+"  - I'm guessing this regexp should
have ()'s for accessing content as a group
identifier  = "([A-Za-z_][A-Za-z0-9_]+)"
comment     = "\*.*"
opt_comment = "(%s)?" % comment

pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

for test in testdata.split("\n"):
    print enquote(test),"\n->",
    if pat.match(test):
        print pat.match(test).groups()
    else:
        print "Bad text"

========================
Gives this output:

test with pyparsing
""
-> Expected keyword (0), (1,1)
"END  *This is a comment"
-> ['END']
"  END*This is a comment (but the next line has no comment)"
-> ['END']
"  END"
-> ['END']
"   END  e * This is a line with an error (e)"
-> Expected end of line (8), (1,9)

test with re
""
-> Bad text
"END  *This is a comment"
-> ('END', '*This is a comment')
"  END*This is a comment (but the next line has no comment)"
-> ('END', '*This is a comment (but the next line has no comment)')
"  END"
-> ('END', None)
"   END  e * This is a line with an error (e)"
-> Bad text