Efficiently determine where documents differ
Richard
richardbp at gmail.com
Tue Jan 5 04:43:12 EST 2010
On Jan 5, 9:46 am, "Gabriel Genellina" <gagsl-... at yahoo.com.ar> wrote:
> En Mon, 04 Jan 2010 19:04:12 -0300, Richard <richar... at gmail.com> escribió:
>
> > I have been using the difflib library to find where 2 large HTML
> > documents differ. The Differ().compare() method does this, but it is
> > very slow - atleast 100x slower than the unix diff command.
>
> Differ compares sequences of lines *and* lines as sequences of characters
> to provide intra-line differences. The diff command only processes lines.
> If you aren't interested in intra-line differences, use a SequenceMatcher
> instead. Or, invoke the diff command using subprocess.Popen +
> communicate.
>
> --
> Gabriel Genellina
thank you very much Gabriel! Passing a list of the document lines
makes the efficiency comparable to the diff command.
Richard
More information about the Python-list
mailing list