[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

Terry Reedy tjreedy at udel.edu
Wed Jul 14 03:45:25 CEST 2010


Summary: adding an autojunk heuristic to difflib without also adding a 
way to turn it off was a bug because it disabled running code.

2.6 and 3.1 each have, most likely, one final version each. Don't fix 
for these but add something to the docs explaining the problem and 
future fix.

2.7 will have several more versions over several years and will be used 
by newcomers who might encounter the problem but not know to diagnose it 
and patch a private copy of the module. So it should have a fix. 
Solutions thought of so far.

1. Modify the heuristic to somewhat fix the problem. Bad (unacceptable) 
because this would silently change behavior and could break tests.

2. Add a parameter that defaults to using the heuristic but allows 
turning it off. Perhaps better, but code that used the new API would 
crash if run on 2.7.0

3.
Tim Peters
> Think the most pressing thing is to give people a way to turn the damn
> thing off.  An ugly way would be to trigger on an unlikely
> input-output behavior of the existing isjunk argument.  For example,
> if
>
>      isjunk("what's the airspeed velocity of an unladen swallow?")
>
> returned
>
>      "don't use auto junk!"
>
> and 2.7.1 recognized that as meaning "don't use auto junk", code could
> be written under 2.7.1 that didn't blow up under 2.7.  It could
> _behave_ differently, although that's true of any way of disabling the
> auto-junk heuristics.

Ugly, but perhaps crazy brilliant. Use of such a hack would obviously be 
temporary. Perhaps its use could be made to issue a -3 warning if such 
were enabled.

I would simplify the suggestion to something like
     isjunk("disable!heuristic") == True
so one could pass
     lambda s:s=="disable!heuristic"
It should be something easy to document and write. This issue is the 
only place such a string should appear, so it should be safe.

Tim and Antoine: if you two can agree on what to do for 2.7, Eli and I 
will code it.

This suggestion amounts to a suggestion that the fix for 2.7 be 
decoupled from a better fix for 3.2. I agree. The latter can be 
discussed once 2.7 is settled.

[copied to the tracker]
-- 
Terry Jan Reedy



More information about the Python-Dev mailing list