Efficiently determine where documents differ
gagsl-py2 at yahoo.com.ar
Mon Jan 4 23:46:13 CET 2010
En Mon, 04 Jan 2010 19:04:12 -0300, Richard <richardbp at gmail.com> escribió:
> I have been using the difflib library to find where 2 large HTML
> documents differ. The Differ().compare() method does this, but it is
> very slow - atleast 100x slower than the unix diff command.
Differ compares sequences of lines *and* lines as sequences of characters
to provide intra-line differences. The diff command only processes lines.
If you aren't interested in intra-line differences, use a SequenceMatcher
instead. Or, invoke the diff command using subprocess.Popen +
More information about the Python-list