what are the most frequently used functions?

robert no-spam at no-spam-no-spam.com
Sat Oct 28 13:29:18 CEST 2006


Xah Lee wrote:
> I had a idea today.
> 
> I wanted to know what are the top most frequently used functions in the
> emacs lisp language. I thought i can write a quick script that go thru
> all the elisp library locations and get a word-frequency report i want.
> 
> I started with a simple program:
> http://xahlee.org/p/titus/count_word_frequency.py
> 
> and applied it to a Shakespeare text. Here's a sample result:
> http://xahlee.org/p/titus/word_frequency.html
> 
> Then, i wrote a more elaborate one that recurse thru directories to
> work on elisp code treasury.
> 
> The code is here:
> http://xahlee.org/x/count_word_frequency.py
> 
> and i got a strange result. The word “the” appeared on the top,
> along with many other English words. I quickly realized that these are
> due to lisp function's doc strings. (not comments)

Would be interesting to see if the type-checking "The" in lisp is still frequent. I doubt.

> At this point, it dawned on me that there's no easy way to work around
> this, Unless, i write this script in elisp which has functions that
> read lisp code and can easily filter out doc strings.
> 
> Originally, i planned to use the word-frequency script on Perl, Python,
> as well as Java, as well as Elisp. However, now it seems to me this
> task is nigh impossible. Each of these lang has their own doc string
> syntax. It's gonna be a heavy undertaking if the word-frequency script
> is to work with all these langs, since that amounts to writing a parser
> for each lang.
> 
> Alternatively, one can write multiple word-frequency scripts using each
> lang in question, since most lang has facilities to deal with its own
> syntax. However, this is still not trivial, and amounts to several
> programing efforts.

Editor code (best maybe scintilla/sc1, check also emacs itself, ...) has libraries for colorizing comments in all kinds of programming langs ...

> Anyone would be interested in this problem?

I have a theory, that "bad source code" has more if/else/elif/case/switch dispatching statements per number of code words (lines..) than "good code" - independent of the language.

If you can count these ratio and correlate it to maybe a sf-ranking and to languages, that would be highly interesting for me... (in case drop a pointer in this thread / repeated subject)



-robert



More information about the Python-list mailing list