[issue2986] difflib.SequenceMatcher not matching long sequences

Terry J. Reedy report at bugs.python.org
Wed Jul 14 03:45:24 CEST 2010


Terry J. Reedy <tjreedy at udel.edu> added the comment:

[copied from pydev post]

Summary: adding an autojunk heuristic to difflib without also adding a way to turn it off was a bug because it disabled running code.

2.6 and 3.1 each have, most likely, one final version each. Don't fix for these but add something to the docs explaining the problem and future fix.

2.7 will have several more versions over several years and will be used by newcomers who might encounter the problem but not know to diagnose it and patch a private copy of the module. So it should have a fix.  Solutions thought of so far.

1. Modify the heuristic to somewhat fix the problem. Bad (unacceptable) because this would silently change behavior and could break tests.

2. Add a parameter that defaults to using the heuristic but allows turning it off. Perhaps better, but code that used the new API would crash if run on 2.7.0

3.
Tim Peters
> Think the most pressing thing is to give people a way to turn the damn
> thing off.  An ugly way would be to trigger on an unlikely
> input-output behavior of the existing isjunk argument.  For example,
> if
> 
>      isjunk("what's the airspeed velocity of an unladen swallow?")
> 
> returned
> 
>      "don't use auto junk!"
> 
> and 2.7.1 recognized that as meaning "don't use auto junk", code could
> be written under 2.7.1 that didn't blow up under 2.7.  It could
> _behave_ differently, although that's true of any way of disabling the
> auto-junk heuristics.

Ugly, but perhaps crazy brilliant. Use of such a hack would obviously be temporary. Perhaps its use could be made to issue a -3 warning if such were enabled.

I would simplify the suggestion to something like
    isjunk("disable!heuristic") == True
so one could pass
    lambda s:s=="disable!heuristic"
It should be something easy to document and write. This issue is the only place such a string should appear, so it should be safe.

Tim and Antoine: if you two can agree on what to do for 2.7, Eli and I will code it.

This suggestion amounts to a suggestion that the fix for 2.7 be decoupled from a better fix for 3.2. I agree. The latter can be discussed once 2.7 is settled.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2986>
_______________________________________


More information about the Python-bugs-list mailing list