Efficiently determine where documents differ

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Mon Jan 4 17:46:13 EST 2010


En Mon, 04 Jan 2010 19:04:12 -0300, Richard <richardbp at gmail.com> escribió:

> I have been using the difflib library to find where 2 large HTML
> documents differ. The Differ().compare() method does this, but it is
> very slow - atleast 100x slower than the unix diff command.

Differ compares sequences of lines *and* lines as sequences of characters  
to provide intra-line differences. The diff command only processes lines.
If you aren't interested in intra-line differences, use a SequenceMatcher  
instead. Or, invoke the diff command using   subprocess.Popen +  
communicate.

-- 
Gabriel Genellina




More information about the Python-list mailing list