python tool: finding duplicate code

Michal Wallace sabren at manifestation.com
Thu May 30 12:51:16 EDT 2002


On Wed, 29 May 2002, Tim Peters wrote:

> > (hmm... Come to think of it, someone could probably find
> > *some* duplicate logic by running source files through the
> > tokenizer first. I wonder if that would work...)
>
> Brenda Baker has done some interesting work on this
> problem (not with Python in mind, but million-line C
> systems):
>
>     http://cm.bell-labs.com/who/bsb/
> 
> Her "On Finding Duplication and Near-Duplication in Large
> Software Systems" is a good entry into the literature.
>
> I have a self-serving reason for mentioning this: if
> somebody whips up a fast suffix tree for Python, I could
> put it to good use in ameliorating difflib.py's worst-case
> time sinks <wink>.

Hey Tim,

Thanks for the link! I found a javascript version of a
suffix tree algorithm online. I ported it to python and it
seems to work... Unfortunately the original code is very
hard to understand. I did a straight port and then tried to
clean it up and make it a little more object oriented, but
when I started looking into the algorithm, I just couldn't
track what was going on.

Then I spent another couple hours trying to rebuild it from
scratch using a pythonic style, but again I couldn't get my
mind around the algorithm... Anyway, I spent all night
messing around with this stuff, and I'm giving up. If
someone wants to take a look, go here:

    http://cvs.sabren.com/sixthdev/cvsweb.cgi/sdunit/

NastySuffixTree.py is the working version.

SuffixTree.py is the cleaner version I tried to build, but
it doesn't implement the whole algorithm.

The javascript version is here:

    http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix/

There's also a suffix tree module written in C, but with a
python binding here:

    http://www-hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/

Cheers,

- Michal   http://www.sabren.net/   sabren at manifestation.com 
------------------------------------------------------------
Switch to Cornerhost!             http://www.cornerhost.com/
       High Powered Hosting - With a Human Touch. :)
------------------------------------------------------------






More information about the Python-list mailing list