On 05 juillet 07:27, Lionel Barret wrote:
My name is Lionel Barret, I attended Florent Xicluna”s Europython talk Tuesday and it reminded me of a clone detection tool I used in the past (on a 100k sloc codebase)
I talked about it with a few people (Florent Xicluna , Joe Gordon) and they were interested. Florent told me it was the list for this kind of discussion.
This tool named clonedigger (http://clonedigger.sourceforge.net/ ) detects copy/pasted code or independent writing of the same classes/functions across a big codebase. In my last job, I used to get a daily html report, a big overview of the things that have been copy/pasted/rewritten. it was really useful.
Sadly, it is unmaintained, the last upload dates from 2011. Besides, it”s using old packages (like the compiler package) and likely incompatible with python3 (either for running or for analyzing).
I really think this kind of tool should be part of any code-quality toolbox, like pyflakes, pep8, etc.
( The tool itself is GPL, so no blocking there. ).
I just wanted to see if anybody would be interested by an updated version of the tool and who could help. From the top of my mind, the next steps would be contacting the original author, evaluate the work to do (obsolete modules used and python3 incompatibilities) and eventually refactor the code.
How does it compare to Pylint's similarity checker? Basically it will reports you copy/pasted/rewritten code implying more than a configurable number of lines, after some normalisation.