difflib-like library supporting moved blocks detection?

Chris Torek nospam at torek.net
Thu Jul 14 00:47:06 EDT 2011


In article <mailman.1002.1310591600.1164.python-list at python.org>
Vlastimil Brom  <vlastimil.brom at gmail.com> wrote:
>I'd like to ask about the availability of a text diff library, like
>difflib, which would support the detection of moved text blocks.

If you allow arbitrary moves, the "minimal edit distance" problem
(string-to-string edit) becomes substantially harder.  If you only
allow insert, delete, or in-place-substitute, you have what is
called the "Levenshtein distance" case.  If you allow transpositions
you get "Damerau-Levenshtein".  These are both solveable with a
dynamic programming algorithm.  Once you allow move operations,
though, the problem becomes NP-complete.

See http://pages.cs.brandeis.edu/~shapird/publications/JDAmoves.pdf
for instance.  (They give an algorithm that produces "usually
acceptable" results in polynomial time.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html



More information about the Python-list mailing list