Hi Sylvain,

Sorry for the late answer, I just moved from Europython, to a week full of meetings to holidays (almost) off the grid.

I didn't follow pylint progress but 2 years ago clonedigger was the clear winner.

But you're right I need to see if having this other tool is worth pursuing.
I'll test again, in one or two weeks to see how they compare now.
It's only a single data point but it is a beginning


On Fri, Jul 5, 2013 at 10:55 AM, Sylvain Thénault <sylvain.thenault@logilab.fr> wrote:
Hello Lionel,

On 05 juillet 07:27, Lionel Barret wrote:
> My name is Lionel Barret, I attended Florent Xicluna”s Europython talk
> Tuesday and it reminded me of a clone detection tool I used in the past (on
> a 100k sloc codebase)
> I talked about it with a few people (Florent Xicluna , Joe Gordon) and they
> were interested. Florent told me it was the list for this kind of
> discussion.
> This tool named clonedigger (http://clonedigger.sourceforge.net/ ) detects
> copy/pasted code or independent writing of the same classes/functions
> across a big codebase. In my last job, I used to get a daily html report, a
> big overview of the things that have been copy/pasted/rewritten. it was
> really useful.
> Sadly, it is unmaintained, the last upload dates from 2011. Besides, it”s
> using old packages (like the compiler package) and likely incompatible with
> python3 (either for running or for analyzing).
> I really think this kind of tool should be part of any code-quality
> toolbox, like pyflakes, pep8, etc.
> ( The tool itself is GPL, so no blocking there. ).
> I just wanted to see if anybody would be interested by an updated version
> of the tool and who could help. From the top of my mind, the next steps
> would be contacting the original author, evaluate the work to do (obsolete
> modules used and python3 incompatibilities) and eventually refactor the
> code.

How does it compare to Pylint's similarity checker? Basically it will reports
you copy/pasted/rewritten code implying more than a configurable number of
lines, after some normalisation.

Sylvain Thénault, LOGILAB, Paris ( - Toulouse (
Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations
Développement logiciel sur mesure:       http://www.logilab.fr/services
CubicWeb, the semantic web framework:    http://www.cubicweb.org

Lionel Barret,
LBdN Consulting
LinkedIn Profile : http://www.linkedin.com/in/lionelbarretdenazaris
Viadeo : http://fr.viadeo.com/fr/profile/lionel.barretdenazaris
Membre de l'Arsenal Numérique