Efficiently determine where documents differ
richardbp at gmail.com
Tue Jan 5 10:43:12 CET 2010
On Jan 5, 9:46 am, "Gabriel Genellina" <gagsl-... at yahoo.com.ar> wrote:
> En Mon, 04 Jan 2010 19:04:12 -0300, Richard <richar... at gmail.com> escribió:
> > I have been using the difflib library to find where 2 large HTML
> > documents differ. The Differ().compare() method does this, but it is
> > very slow - atleast 100x slower than the unix diff command.
> Differ compares sequences of lines *and* lines as sequences of characters
> to provide intra-line differences. The diff command only processes lines.
> If you aren't interested in intra-line differences, use a SequenceMatcher
> instead. Or, invoke the diff command using subprocess.Popen +
> Gabriel Genellina
thank you very much Gabriel! Passing a list of the document lines
makes the efficiency comparable to the diff command.
More information about the Python-list