Regular Expressions in Python
Paul McGuire
ptmcg at users.sourceforge.net
Mon Mar 1 14:04:36 EST 2004
"fossil_blue" <noelrivera at excite.com> wrote in message
news:c7c33240.0403010618.12391181 at posting.google.com...
> Dear Gurus,
>
> I am trying to find out how to write an effective regular expression
> in python for the following scenario:
>
> "any number of leading spaces at the beginning of a line" "follow
> by a string" "there maybe a string that starts with *"
>
> for example:
>
> END *This is a comment
>
> but I don't want to match this:
>
> END e * This is a line with an error (e)
>
> thanks,
> Noel
Here's an example with sample code using both re's and pyparsing. Note that
the single .ignore() call takes care of ignoring comments on all contained
grammar constructs, and non-significant whitespace is implicitly ignored (so
no need to litter your matching expressions with lots of opt_spaces-type
content).
-- Paul
========================
from pyparsing import Word, alphas, alphanums, restOfLine, LineEnd,
ParseException
testdata = """
END *This is a comment
END*This is a comment (but the next line has no comment)
END
END e * This is a line with an error (e)"""
enquote = lambda st : ( '"%s"' % st )
print "test with pyparsing"
grammar = Word( alphas, alphanums ).setName("keyword") + LineEnd()
comment = "*" + restOfLine
grammar.ignore( comment )
for test in testdata.split("\n"):
try:
print enquote(test),"\n->",
print grammar.parseString( test )
except ParseException, pe:
print pe
print
import re
print "test with re"
opt_spaces = " *"
#identifier = "[A-Za-z_][A-Za-z0-9_]+" - I'm guessing this regexp should
have ()'s for accessing content as a group
identifier = "([A-Za-z_][A-Za-z0-9_]+)"
comment = "\*.*"
opt_comment = "(%s)?" % comment
pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")
for test in testdata.split("\n"):
print enquote(test),"\n->",
if pat.match(test):
print pat.match(test).groups()
else:
print "Bad text"
========================
Gives this output:
test with pyparsing
""
-> Expected keyword (0), (1,1)
"END *This is a comment"
-> ['END']
" END*This is a comment (but the next line has no comment)"
-> ['END']
" END"
-> ['END']
" END e * This is a line with an error (e)"
-> Expected end of line (8), (1,9)
test with re
""
-> Bad text
"END *This is a comment"
-> ('END', '*This is a comment')
" END*This is a comment (but the next line has no comment)"
-> ('END', '*This is a comment (but the next line has no comment)')
" END"
-> ('END', None)
" END e * This is a line with an error (e)"
-> Bad text
More information about the Python-list
mailing list