python tool: finding duplicate code

Michal Wallace sabren at manifestation.com
Wed May 29 08:48:44 EDT 2002


In "Refactoring: Improving the Design of Existing Code",
Martin Fowler and Kent Beck list duplicate code as their
number one "Code Smell"

Since I'm looking to clean up some of my projects, I went
looking for a way to find duplicate lines in my python
project... So I wrote a little python program to do it for
me.

Features:

    - lists all pairs of overlapping files
    - shows how many lines are in common
    - shows each duplicated line
    - ignores indentation
    - filters out try, pass, if __name__=="__main__", etc.

It's just string-matching, so it won't find duplicate logic
with different variable names or layout, but it *can* find
cut and paste issues.

(hmm... Come to think of it, someone could probably find
*some* duplicate logic by running source files through the
tokenizer first. I wonder if that would work...)

Anyway, just thought I'd share:

http://cvs.sabren.com/sixthdev/cvsweb.cgi/sdunit/overlaps.py?rev=1.1

Cheers,

- Michal   http://www.sabren.net/   sabren at manifestation.com 
------------------------------------------------------------
Learn to build web apps!      http://www.webAppWorkshop.com/
------------------------------------------------------------






More information about the Python-list mailing list