python tool: finding duplicate code

Tim Peters tim.one at comcast.net
Mon Jun 3 17:27:05 EDT 2002


[Michal Wallace]
> Thanks for the link! I found a javascript version of a
> suffix tree algorithm online. I ported it to python and it
> seems to work... Unfortunately the original code is very
> hard to understand. I did a straight port and then tried to
> clean it up and make it a little more object oriented, but
> when I started looking into the algorithm, I just couldn't
> track what was going on.

Don't feel bad!  Suffix trees are very surprising.  In fact, they caught
everyone by surprise when they were invented, allowing linear-time solution
of several problems that were conjectured to be impossible to solve in
linear time.  Ukkonen's algorithm (on which the JavaScript version seems to
be based) is considered much clearer than its predecessors, but it's also
subtle to the point of madness <wink>.

A much more obvious approach is to materialize all the suffixes explicitly
into a list, and then just sort the list.  Stop right there and you're close
to a "suffix array".  Building a tree from the sorted list is simple then.
But this doesn't work in linear time, so misses part of the point.

In any case, suffix tree algorithms probably need to be coded in C to be of
practical use (lots of low-level fiddling involved).

> ...
> There's also a suffix tree module written in C, but with a
> python binding here:
>
>     http://www-hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/

Wow!  Danny sure gets around -- if I can weasle a few free hours I'll
definitely play with this.  Thanks for the link!






More information about the Python-list mailing list