[issue2986] difflib.SequenceMatcher not matching long sequences
report at bugs.python.org
Thu Jul 8 01:17:03 CEST 2010
Vlastimil Brom <vlastimil.brom at gmail.com> added the comment:
I guess, I am not supposed to post to python-dev - not being a python developer, hopefully it is appropriate to add a comment here - only based on my current usage of (a modified) difflib.SequenceMatcher.
It seems, the mentions of text comparison in that thread, e.g.
etc. rather imply line-by-line comparison, and possibly character comparison of matched lines.
For me the direct character-wise comparison is more useful in most cases.
With the popular heuristics disabled the results look pretty well.
(the script only involves changing the background colour of the compared texts - based on the SequenceMatcher - get_opcodes() )
Just now, I only need to disable the popular check, currently I use a monkey-patched subclass of SequenceMatcher with extended signature and modified __chain_b function.
I would vote for extending the SequenceMatcher API to enable adjustments (leaving the default values as the current ones) - enable/disable popular check, set the thresholds for string length and "popular" frequency (and eventually other parameters, which might be added).
Are there some restrictions on API changes in a library due to a moratorium - even if the default behaviour remains unchanged?
Otherwise, what might be the disadvantages of this approach?
If the current behaviour is considered appropriate for the original usecases, other uses would be also made possible/easier - only at the cost of learning the meaning of the added parameters - from the enhanced docs, of course.
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list