[issue43473] Junks in difflib

New submission from Hubert Bonnisseur-De-La-Bathe <hubertbdlb@gmail.com>: Reading first at the documentation of difflib, I thought that the use of junks would have produced the result s = SequenceMatcher(lambda x : x == " ", "abcd efgh", "abcdefgh") s.get_matching_blocks()
[Match(a=0, b=0, size=8)]
At a second lecture, it is clear that such evaluation will return in fact two matches of length 4. Would it be nicer to have get_matching_block return the length 8 match ? Don't know if it's in the spirit of the lib, I'm just asking. ---------- assignee: docs@python components: Documentation messages: 388491 nosy: docs@python, hubertbdlb priority: normal severity: normal status: open title: Junks in difflib type: enhancement versions: Python 3.8 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue43473> _______________________________________

Change by Karthikeyan Singaravelan <tir.karthi@gmail.com>: ---------- nosy: +tim.peters _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue43473> _______________________________________

Terry J. Reedy <tjreedy@udel.edu> added the comment: Currently return tuple (i, j, n), means that a[i:i+n] == b[j:j+n], where both matching blocks are the same length. https://docs.python.org/3/library/difflib.html#difflib.SequenceMatcher.get_m... This would not be the case if a has an ignored space and b does not. Changing the current definition would break existing code and would require quadruples to return two different lengths. This would require either a new parameter for the function to select the behavior or a new function with a new name. Either option would require justification by actual use cases. I cannot see what they might be. An way to have junk chars completely ignored is to strip them from both strings before calling SequenceMatcher. ---------- nosy: +terry.reedy _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue43473> _______________________________________
participants (3)
-
Hubert Bonnisseur-De-La-Bathe
-
Karthikeyan Singaravelan
-
Terry J. Reedy