How to detect typos in Python programs
Bengt Richter
bokr at oz.net
Fri Jul 25 21:54:19 EDT 2003
On Fri, 25 Jul 2003 12:20:57 -0600, Bob Gailer <bgailer at alum.rpi.edu> wrote:
>--=======6B79482F=======
>Content-Type: text/plain; x-avg-checked=avg-ok-74704BF8; charset=us-ascii; format=flowed
>Content-Transfer-Encoding: 8bit
>
>At 07:26 PM 7/25/2003 +0530, Manish Jethani wrote:
>
>>Hi all,
>>
>>Is there a way to detect typos in a Python program, before
>>actually having to run it. Let's say I have a function like this:
>>
>> def server_closed_connection():
>> session.abost()
>>
>>Here, abort() is actually misspelt. The only time my program
>>follows this path is when the server disconnects from its
>>end--and that's like once in 100 sessions. So sometimes I
>>release the program, people start using it, and then someone
>>reports this typo after 4-5 days of the release (though it's
>>trivial to fix manually at the user's end, or I can give a patch).
>>
>>How can we detect these kinds of errors at development time?
>>It's not practical for me to have a test script that can make
>>the program go through all (most) the possible code paths.
>
>consider:
> use a regular expression to get a list of all the identifiers in the program
> count occurrence of each by adding to/updating a dictionary
> sort and display the result
>
>program_text = """ def server_closed_connection():
> session.abost()"""
>import re
>words = re.findall(r'([A-Za-z_]\w*)\W*', program_text) # list of all
>identifiers
>wordDict = {}
>for word in words: wordDict[word] = wordDict.setdefault(word,0)+1 # dict of
>identifiers w/ occurrence count
>wordList = wordDict.items()
>wordList.sort()
>for wordCount in wordList: print '%-25s %3s' % wordCount
>
>output (approximate, as I used tabs):
>
>abost 1
>def 1
>server_closed_connection 1
>session 1
>
>You can then examine this list for suspect names, especially those that
>occur once. We could apply some filtering to remove keywords and builtin names.
>
>We could add a comment at the start of the program containing all the valid
>names, and extend this process to report just the ones that are not in the
>valid list.
>
That's cool. If you want to go further, and use symbols that the actual program
is using (excluding comment stuff) try:
====< prtok.py >========================================================
#prtok.py
import sys, tokenize, glob, token
symdir={}
def tokeneater(type, tokstr, start, end, line, symdir=symdir):
if (type==token.NAME):
TOKSTR = tokstr.upper() #should show up for this file
if symdir.has_key(TOKSTR):
d = symdir[TOKSTR]
if d.has_key(tokstr):
d[tokstr] += 1
else:
d[tokstr] = 1
else:
symdir[TOKSTR]={ tokstr:1 }
for fileglob in sys.argv[1:]:
for filename in glob.glob(fileglob):
symdir.clear()
tokenize.tokenize(open(filename).readline, tokeneater)
header = '\n====< '+filename+' >===='
singlecase = []
multicase = [key for key in symdir.keys()
if len(symdir[key])>1 or singlecase.append(key)]
for key in multicase:
if header:
print header
print ' (Multicase symbols)'
header = None
for name, freq in symdir[key].items():
print '%15s:%-3s'% (name, freq),
print
if header: print header; header = None
print ' (Singlecase symbols)'
byfreq = [symdir[k].items()[0] for k in singlecase]
byfreq = [(n,k) for k,n in byfreq]
byfreq.sort()
npr = 0
for freq, key in byfreq:
if header:
print header
header = None
print '%15s:%-3s'% (key, freq),
npr +=1
if npr%4==3: print
print
========================================================================
Operating on itself and another little file (you can specify file glob expressions too):
[18:55] C:\pywk\tok>prtok.py prtok.py gt.py
====< prtok.py >====
(Multicase symbols)
tokstr:6 TOKSTR:4
NAME:1 name:2
(Singlecase symbols)
append:1 argv:1 clear:1
def:1 end:1 import:1 keys:1
len:1 line:1 open:1 or:1
readline:1 sort:1 start:1 upper:1
else:2 fileglob:2 has_key:2 items:2
multicase:2 n:2 sys:2 token:2
tokeneater:2 type:2 None:3 filename:3
glob:3 npr:3 singlecase:3 tokenize:3
d:4 freq:4 k:4 byfreq:5
for:8 if:8 in:8 key:8
header:10 print:10 symdir:11
====< gt.py >====
(Singlecase symbols)
__name__:1 argv:1 def:1
if:1 fn:2 for:2 import:2
in:2 main:2 print:2 sys:2
arg:3 glob:3
Regards,
Bengt Richter
More information about the Python-list
mailing list