How to detect typos in Python programs

Bengt Richter bokr at oz.net
Fri Jul 25 21:54:19 EDT 2003


On Fri, 25 Jul 2003 12:20:57 -0600, Bob Gailer <bgailer at alum.rpi.edu> wrote:

>--=======6B79482F=======
>Content-Type: text/plain; x-avg-checked=avg-ok-74704BF8; charset=us-ascii; format=flowed
>Content-Transfer-Encoding: 8bit
>
>At 07:26 PM 7/25/2003 +0530, Manish Jethani wrote:
>
>>Hi all,
>>
>>Is there a way to detect typos in a Python program, before
>>actually having to run it.  Let's say I have a function like this:
>>
>>   def server_closed_connection():
>>     session.abost()
>>
>>Here, abort() is actually misspelt.  The only time my program
>>follows this path is when the server disconnects from its
>>end--and that's like once in 100 sessions.  So sometimes I
>>release the program, people start using it, and then someone
>>reports this typo after 4-5 days of the release (though it's
>>trivial to fix manually at the user's end, or I can give a patch).
>>
>>How can we detect these kinds of errors at development time?
>>It's not practical for me to have a test script that can make
>>the program go through all (most) the possible code paths.
>
>consider:
>  use a regular expression to get a list of all the identifiers in the program
>  count occurrence of each by adding to/updating a dictionary
>  sort and display the result
>
>program_text = """  def server_closed_connection():
>     session.abost()"""
>import re
>words = re.findall(r'([A-Za-z_]\w*)\W*', program_text) # list of all 
>identifiers
>wordDict = {}
>for word in words: wordDict[word] = wordDict.setdefault(word,0)+1 # dict of 
>identifiers w/ occurrence count
>wordList = wordDict.items()
>wordList.sort()
>for wordCount in wordList: print '%-25s %3s' % wordCount
>
>output (approximate, as I used tabs):
>
>abost                           1
>def                             1
>server_closed_connection        1
>session                 1
>
>You can then examine this list for suspect names, especially those that 
>occur once. We could apply some filtering to remove keywords and builtin names.
>
>We could add a comment at the start of the program containing all the valid 
>names, and extend this process to report just the ones that are not in the 
>valid list.
>
That's cool. If you want to go further, and use symbols that the actual program
is using (excluding comment stuff) try:

====< prtok.py >========================================================
#prtok.py
import sys, tokenize, glob, token

symdir={}

def tokeneater(type, tokstr, start, end, line, symdir=symdir):
    if (type==token.NAME):
	TOKSTR = tokstr.upper()		#should show up for this file
        if symdir.has_key(TOKSTR):
            d = symdir[TOKSTR]
            if d.has_key(tokstr):
                d[tokstr] += 1
            else:
                d[tokstr] = 1
        else:
            symdir[TOKSTR]={ tokstr:1 }

for fileglob in sys.argv[1:]:
    for filename in glob.glob(fileglob):
        symdir.clear()
        tokenize.tokenize(open(filename).readline, tokeneater)

	header = '\n====< '+filename+' >===='
        singlecase = []
        multicase = [key for key in symdir.keys()
                        if len(symdir[key])>1 or singlecase.append(key)]
        for key in multicase:
            if header:
                print header
                print '  (Multicase symbols)'
                header = None
            for name, freq in symdir[key].items():
                print '%15s:%-3s'% (name, freq),
            print
        if header: print header; header = None
        print '  (Singlecase symbols)'
        byfreq = [symdir[k].items()[0] for k in singlecase]
        byfreq = [(n,k) for k,n in byfreq]
        byfreq.sort()
        npr = 0
        for freq, key in byfreq:
                if header:
                    print header
                    header = None
                print '%15s:%-3s'% (key, freq),
                npr +=1
                if npr%4==3: print
        print
========================================================================
Operating on itself and another little file (you can specify file glob expressions too):

[18:55] C:\pywk\tok>prtok.py prtok.py gt.py

====< prtok.py >====
  (Multicase symbols)
         tokstr:6            TOKSTR:4
           NAME:1              name:2
  (Singlecase symbols)
         append:1              argv:1             clear:1
            def:1               end:1            import:1              keys:1
            len:1              line:1              open:1                or:1
       readline:1              sort:1             start:1             upper:1
           else:2          fileglob:2           has_key:2             items:2
      multicase:2                 n:2               sys:2             token:2
     tokeneater:2              type:2              None:3          filename:3
           glob:3               npr:3        singlecase:3          tokenize:3
              d:4              freq:4                 k:4            byfreq:5
            for:8                if:8                in:8               key:8
         header:10            print:10           symdir:11

====< gt.py >====
  (Singlecase symbols)
       __name__:1              argv:1               def:1
             if:1                fn:2               for:2            import:2
             in:2              main:2             print:2               sys:2
            arg:3              glob:3

Regards,
Bengt Richter




More information about the Python-list mailing list