Efficiently determine where documents differ
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Mon Jan 4 17:46:13 EST 2010
En Mon, 04 Jan 2010 19:04:12 -0300, Richard <richardbp at gmail.com> escribió:
> I have been using the difflib library to find where 2 large HTML
> documents differ. The Differ().compare() method does this, but it is
> very slow - atleast 100x slower than the unix diff command.
Differ compares sequences of lines *and* lines as sequences of characters
to provide intra-line differences. The diff command only processes lines.
If you aren't interested in intra-line differences, use a SequenceMatcher
instead. Or, invoke the diff command using subprocess.Popen +
communicate.
--
Gabriel Genellina
More information about the Python-list
mailing list