[Tutor] Re: Finding C comments with regular expressions

Magnus Lycka magnus at thinkware.se
Wed Apr 21 16:29:06 EDT 2004


Danny Yoo wrote:
> Actually, C should behave as if:
> 
> > /* We comment out some code
> >
> > sprintf("We can use */

Silly me, of course. This is what the program I wrote did (I think).

But you still has to take string literals into consideration, since
a C comment can't start inside a string. E.g. the following C code
contains no comments:

a = " Oh /* dear";
b = " Here */ we go again";

I also think that the middle line below isn't commented out, right?

// /*
x = "Not commented out";
// */

On the other hand, the // in the line below doesn't start a C++
style comment...

/* // */

..so you can't just parse the code for all C++ comments first, and
remove them, and then parse for C style comments and remove them,
and you can't do it the other way around either. And even if you
only had one type of comments, you have the same kind of problem
with string literals, as you noticed above.

I don't know how to write regular expressions to handle such things,
and I feel overwhelmed by the thought of trying to describe and
solve all the problems at once.

With the state machine approach I described, I can solve my problem
one step at a time. If I'm inside a macro I will get into a comment
if I run into /* or //, and I will get into a string literal if I run
into ", and the macro ends if I get to emacro.

If I'm in a string literal I get out of it if I find " , and it's an 
error if the line ends. (There might be \ escapes to consider I guess.)

If I'm in a C++ comment, it will end when the line ends and so on.

Each piece is fairly simple. Combined they solve the problem. For speed
use pyrex!


-- 
Magnus Lycka, Thinkware AB
Alvans vag 99, SE-907 50 UMEA, SWEDEN
phone: int+46 70 582 80 65, fax: int+46 70 612 80 65
http://www.thinkware.se/  mailto:magnus at thinkware.se



More information about the Tutor mailing list