
On 1 Apr 2002, Jeffrey Elkner wrote:
i got such a great response to my last query that i'm trying another one ;-) is there anything out there already that i can use to parse python, c++, and java source files to get a listing and count of the lexemes that occur in each?
i spent the better part of an afternoon writing python scripts to remove comments and docstrings so that i could compare line numbers, and i'm afraid parsing to get at the lexemes is beyond my ability within the time i have left to prepare my thesis.
The Antlr parser generator by Terrence Parr, http://www.antlr.org/ has an example lexer/parser for Java 1.3, so you might be able to generate a Java lexer and parser using Antlr, and then drive it with Jython. I also saw a link to a production-quality C lexer and parser as well. This project looks interesting; if I have time, I'll see if I can cook up something. *grin* Good luck to you!