difflib-like library supporting moved blocks detection?
Chris Torek
nospam at torek.net
Thu Jul 14 00:47:06 EDT 2011
In article <mailman.1002.1310591600.1164.python-list at python.org>
Vlastimil Brom <vlastimil.brom at gmail.com> wrote:
>I'd like to ask about the availability of a text diff library, like
>difflib, which would support the detection of moved text blocks.
If you allow arbitrary moves, the "minimal edit distance" problem
(string-to-string edit) becomes substantially harder. If you only
allow insert, delete, or in-place-substitute, you have what is
called the "Levenshtein distance" case. If you allow transpositions
you get "Damerau-Levenshtein". These are both solveable with a
dynamic programming algorithm. Once you allow move operations,
though, the problem becomes NP-complete.
See http://pages.cs.brandeis.edu/~shapird/publications/JDAmoves.pdf
for instance. (They give an algorithm that produces "usually
acceptable" results in polynomial time.)
--
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: gmail (figure it out) http://web.torek.net/torek/index.html
More information about the Python-list
mailing list