String Manipulation Help!

Paul McGuire ptmcg at
Sat Jan 28 16:47:01 EST 2006

"Dave" <davidworley at> wrote in message
news:1138481853.165529.321870 at
> OK, I'm stumped.
> I'm trying to find newline characters (\n, specifically) that are NOT
> in comments.
> So, for example (where "<-" = a newline character):
> ==========================================
> 1: <-
> 2: /*<-
> 3: ----------------------<-
> 4:     comment<-
> 5: ----------------------<-
> 6: */<-
> 7: <-
> 9: <-
> ==========================================
> I want to return the newline characters at lines 1, 6, 7, 8, and 9 but
> NOT the others.

Dave -

Pyparsing has built-in support for detecting line breaks and comments, and
the syntax is pretty simple, I think.  Here's a pyparsing program that gives
your desired results:

from pyparsing import lineEnd, cStyleComment, lineno

testsource = """



# define the expression you want to search for
eol = lineEnd

# specify that you don't want to match within C-style comments

# loop through all the occurrences returned by scanString
# and print the line number of that location within the original string
for toks,startloc,endloc in eol.scanString(testsource):
    print lineno(startloc,data)

The expression you are searching for is pretty basic, just a plain
end-of-line, or pyparsing's built-in expression, lineEnd.  The curve you are
throwing is that you *don't* want eol's inside of C-style comments.
Pyparsing allows you to designate an "ignore" expression to skip undesirable
content, and fortunately, ignoring comments happens so often during parsing,
that pyparsing includes common comment expressions for C, C++, Java, Python,
and HTML.  Next, pyparsing's version of is scanString.  scanString
returns a generator that gives the matching tokens, start location, and end
location of every occurrence of the given parse expression, in your case,
eol.  Finally, in the body of our for loop, we use pyparsing's lineno
function to give us the line number of a string location within the original

About the only real wart on all this is that pyparsing implicitly skips over
leading whitespace, even when looking for expressions to be ignored.  In
order not to lose eols that are just before a comment (like your line 1), we
have to modify cStyleComment to leave leading whitespace.

Download pyparsing at

-- Paul

More information about the Python-list mailing list