Efficiently determine where documents differ

Richard richardbp at gmail.com
Mon Jan 4 23:04:12 CET 2010


I have been using the difflib library to find where 2 large HTML
documents differ. The Differ().compare() method does this, but it is
very slow - atleast 100x slower than the unix diff command.

How can I efficiently determine where 2 documents differ in Python?
(Ideally I am after the positions rather the actual text, which is
what SequenceMatcher().get_opcodes() returns.)


More information about the Python-list mailing list