Comparing two book chapters (text files)

Chris Rebert crebert at
Wed Feb 4 20:41:41 EST 2009

On Wed, Feb 4, 2009 at 5:20 PM, Nick Matzke <matzke at> wrote:
> Hi all,
> So I have an interesting challenge.  I want to compare two book chapters,
> which I have in plain text format, and find out (a) percentage similarity
> and (b) what has changed.
> Some features make this problem different than what seems to be the standard
> text-matching problem solvable with e.g. difflib.  Here is what I mean:
> * there is no guarantee that single lines from each file will be directly
> comparable -- e.g., if a few words are inserted into a sentence, then a
> chunk of the sentence will be moved to the next line, then a chunk of that
> line moved to the next, etc.
> * Also, there are cases where paragraphs have been moved around, sections
> re-ordered, etc.  So it can't just be a "linear" match.
> I imagine this kind of thing can't be all that hard in the grand scheme of
> things, but I couldn't find an easily applicable solution readily available.
>  I have advanced beginner python skills but am not quite where I could do
> this kind of thing from scratch without some guidance about the likely
> functions, libraries etc. to use.
> PS: I am going to have to do this for multiple book chapters so various
> software packages, e.g. for windows, are not really usable.

Though not written in Python, wdiff
( might be a good
starting point.


Follow the path of the Iguana...

More information about the Python-list mailing list