[Edu-sig] counting lexemes...
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Tue, 2 Apr 2002 17:49:52 -0800 (PST)
On 1 Apr 2002, Jeffrey Elkner wrote:
> i got such a great response to my last query that i'm trying another one
> ;-) is there anything out there already that i can use to parse python,
> c++, and java source files to get a listing and count of the lexemes
> that occur in each?
>
> i spent the better part of an afternoon writing python scripts to remove
> comments and docstrings so that i could compare line numbers, and i'm
> afraid parsing to get at the lexemes is beyond my ability within the
> time i have left to prepare my thesis.
The Antlr parser generator by Terrence Parr,
http://www.antlr.org/
has an example lexer/parser for Java 1.3, so you might be able to generate
a Java lexer and parser using Antlr, and then drive it with Jython. I
also saw a link to a production-quality C lexer and parser as well.
This project looks interesting; if I have time, I'll see if I can cook up
something. *grin*
Good luck to you!