simplest way to strip a comment from the end of a line?

Paul McGuire ptmcg at austin.rr.com
Thu Dec 4 17:35:42 EST 2008


Yowza!  My eyes glaze over when I see re's like "r'(?m)^(?P<data>.*?
(".*?".*?)*)(?:#.*?)?$"!

Here's a simple recognizer that reads source code and suppresses
comments.  A comment will be a '#' character followed by the rest of
the line.  We need the recognizer to also detect quoted strings, so
that any would-be '#' comment introducers that are in a quoted string
*wont* incur the stripping wrath of the recognizer.  A quoted string
must be recognized before recognizing a '#' comment introducer.

With our input tests given as:

tests ='''this is a test 1
this is a test 2 #with a comment
this is a '#gnarlier' test #with a comment
this is a "#gnarlier" test #with a comment
'''.splitlines()

here is such a recognizer implemented using pyparsing.


from pyparsing import quotedString, Suppress, restOfLine

comment = Suppress('#' + restOfLine)
recognizer = quotedString | comment

for t in tests:
    print t
    print recognizer.transformString(t)
    print


Prints:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
this is a "#gnarlier" test


For some added fun, add a parse action to quoted strings, to know when
we've really done something interesting:

def detectGnarliness(tokens):
    if '#' in tokens[0]:
        print "Ooooh, how gnarly! ->", tokens[0]
quotedString.setParseAction(detectGnarliness)

Now our output becomes:

this is a test 1
this is a test 1

this is a test 2 #with a comment
this is a test 2

this is a '#gnarlier' test #with a comment
Ooooh, how gnarly! -> '#gnarlier'
this is a '#gnarlier' test

this is a "#gnarlier" test #with a comment
Ooooh, how gnarly! -> "#gnarlier"
this is a "#gnarlier" test


-- Paul



More information about the Python-list mailing list