difflib.ndiff broken?

Tim Peters tim.peters at gmail.com
Fri Jul 16 00:25:31 CEST 2004


[Humpdydum]
> Can anyone try the following in their python interpreter?
>
> These give correct output:
> 
> >>> print list(ndiff(['saving2 <<A'],['saving <<a>>']))
> ['- saving2 <<A', '?       -   ^\n', '+ saving <<a>>', '?          ^^^\n']
> >>> print list(ndiff(['saving2 <<AA'],['saving <<a>>']))
> ['- saving2 <<AA', '?       -   ^^\n', '+ saving <<a>>', '?          ^^^\n']
> >>> print list(ndiff(['saving2 <<A'],['saving <<aa>>']))
> ['- saving2 <<A', '?       -   ^\n', '+ saving <<aa>>', '?          ^^^^\n']
> >>> print list(ndiff(['saving <<A'],['saving <<aa>>']))
> ['- saving <<A', '?          ^\n', '+ saving <<aa>>', '?          ^^^^\n']
> 
> Now try the very slight variations:
> 
> >>> print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))
> ['- saving2 <<AA', '+ saving <<aa>>']
> >>> print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))
> ['- saving2 <<AA', '+ saving <<aa>>']
> 
> This can't be right... or is it? Where are the '? ...' lines? It does this
> for both Python 2.3.2 on Windows 2000 and Python 2.3.3 on SGI. If it's
> correct, how come???

ndiff produces intraline difference marking if and only if it thinks
the inputs are "reasonably close".  The cutoff between "reasonably
close" and "not reasonably close" is necessarily heuristic.  '?' lines
are more irritating than helpful when they have a lot of markup in
them, so it certainly wan't intended that '?' lines *always* be
produced.  The '+' and '-' lines contain all the information about how
to change one sequence into another; the '?' lines are fluff (abeit
sometimes helpful fluff -- that's why they're (sometimes) there).

Concretely, ndiff produces intraline marking iff two lines have a
similarity ratio of at least 0.75.  In your first examples, the lines
do:

>>> import difflib
>>> m = difflib.SequenceMatcher()
>>> m.set_seqs('saving2 <<A', 'saving <<a>>')
>>> print m.ratio()
0.782608695652

In your last examples, the lines don't:

>>> m.set_seqs('saving2 <<AA', 'saving <<aa>>')
>>> print m.ratio()
0.72
>>>

Internally, 0.75 is the default value of FancyReplacer's optional
minimal_cutoff argument.



More information about the Python-list mailing list