[New-bugs-announce] [issue12384] difflib.SequenceMatcher and Match: code and doc bugs

Terry J. Reedy report at bugs.python.org
Tue Jun 21 22:00:41 CEST 2011

New submission from Terry J. Reedy <tjreedy at udel.edu>:

The basic problem: in 2.6, a namedtuple was introduced to difflib

from collections import namedtuple as _namedtuple
Match = _namedtuple('Match', 'a b size')

and used for the return values of SeqeunceMatcher.get_longest_match and .get_matching_blocks. Code, docstrings, and docs were only partially updated to match.


    def get_matching_blocks(self):
        """Return list of triples describing matching subsequences.
        Each triple is of the form (i, j, n), and means that
        if self.matching_blocks is not None:
            return self.matching_blocks
        self.matching_blocks = non_adjacent
        return map(Match._make, self.matching_blocks)

The two returns are different because only the second was changed.
The obvious fix is to change the first to match. Or perhaps self.matching_blocks (an undocumented cache) should be the map object.

Docstring and doc for .find_longest_match():

Both start
 "Find longest matching block ... returns (i, j, k) such that ... "
Doc (bug not docstring) explicitly says at the *bottom* of the entry "This method returns a named tuple Match(a, b, size)."
which is different from (i,j,n). For 2.7, the note is preceded by "Changed in version 2.6:"
The examples show the change before it is described.

I think that the current return should be accurately described at the *top* of the entry, not the bottom. 2.7 would then end with "Changed in version 2.6: return Match instead of tuple."

Docstring and doc for .get_matching_blocks():

See code snippet above for beginning of text. Unlike .find_longest_match, there is no mention of the changed return.

In 2.7, it is a list of Match triples.
In 3.x, it is an iterable (Map) of Match triples, because of the change in map() return.

For the latter reason, the example in the 3.x doc must be changed to
>>> list(s.get_matching_blocks())

The docstring was already changed to pass doctest. The untested doc was not.

I am not sure how to properly document the use of a namedtuple in the stdlib. Raymond, what do you think?

assignee: rhettinger
components: Documentation, Library (Lib)
messages: 138799
nosy: rhettinger, terry.reedy
priority: normal
severity: normal
status: open
title: difflib.SequenceMatcher and Match: code and doc bugs
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list