[Python-checkins] cpython: #14332: provide a better explanation of junk in difflib docs

andrew.kuchling python-checkins at python.org
Wed Mar 19 21:44:26 CET 2014


http://hg.python.org/cpython/rev/0a69b1e8b7fe
changeset:   89861:0a69b1e8b7fe
user:        Andrew Kuchling <amk at amk.ca>
date:        Wed Mar 19 16:43:06 2014 -0400
summary:
  #14332: provide a better explanation of junk in difflib docs

Initial patch by Alba Magallanes.

files:
  Doc/library/difflib.rst |  14 +++++++++++---
  Lib/difflib.py          |  26 +++++++++++++-------------
  2 files changed, 24 insertions(+), 16 deletions(-)


diff --git a/Doc/library/difflib.rst b/Doc/library/difflib.rst
--- a/Doc/library/difflib.rst
+++ b/Doc/library/difflib.rst
@@ -27,7 +27,9 @@
    little fancier than, an algorithm published in the late 1980's by Ratcliff and
    Obershelp under the hyperbolic name "gestalt pattern matching."  The idea is to
    find the longest contiguous matching subsequence that contains no "junk"
-   elements (the Ratcliff and Obershelp algorithm doesn't address junk).  The same
+   elements; these "junk" elements are ones that are uninteresting in some
+   sense, such as blank lines or whitespace.  (Handling junk is an
+   extension to the Ratcliff and Obershelp algorithm.) The same
    idea is then applied recursively to the pieces of the sequences to the left and
    to the right of the matching subsequence.  This does not yield minimal edit
    sequences, but does tend to yield matches that "look right" to people.
@@ -210,7 +212,7 @@
    Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
    delta (a :term:`generator` generating the delta lines).
 
-   Optional keyword parameters *linejunk* and *charjunk* are for filter functions
+   Optional keyword parameters *linejunk* and *charjunk* are filtering functions
    (or ``None``):
 
    *linejunk*: A function that accepts a single string argument, and returns
@@ -224,7 +226,7 @@
    *charjunk*: A function that accepts a character (a string of length 1), and
    returns if the character is junk, or false if not. The default is module-level
    function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
-   blank or tab; note: bad idea to include newline in this!).
+   blank or tab; it's a bad idea to include newline in this!).
 
    :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
 
@@ -624,6 +626,12 @@
    length 1), and returns true if the character is junk. The default is ``None``,
    meaning that no character is considered junk.
 
+   These junk-filtering functions speed up matching to find
+   differences and do not cause any differing lines or characters to
+   be ignored.  Read the description of the
+   :meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
+   parameter for an explanation.
+
    :class:`Differ` objects are used (deltas generated) via a single method:
 
 
diff --git a/Lib/difflib.py b/Lib/difflib.py
--- a/Lib/difflib.py
+++ b/Lib/difflib.py
@@ -853,10 +853,9 @@
           and return true iff the string is junk. The module-level function
           `IS_LINE_JUNK` may be used to filter out lines without visible
           characters, except for at most one splat ('#').  It is recommended
-          to leave linejunk None; as of Python 2.3, the underlying
-          SequenceMatcher class has grown an adaptive notion of "noise" lines
-          that's better than any static definition the author has ever been
-          able to craft.
+          to leave linejunk None; the underlying SequenceMatcher class has
+          an adaptive notion of "noise" lines that's better than any static
+          definition the author has ever been able to craft.
 
         - `charjunk`: A function that should accept a string of length 1. The
           module-level function `IS_CHARACTER_JUNK` may be used to filter out
@@ -1299,17 +1298,18 @@
     Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
 
     Optional keyword parameters `linejunk` and `charjunk` are for filter
-    functions (or None):
+    functions, or can be None:
 
-    - linejunk: A function that should accept a single string argument, and
+    - linejunk: A function that should accept a single string argument and
       return true iff the string is junk.  The default is None, and is
-      recommended; as of Python 2.3, an adaptive notion of "noise" lines is
-      used that does a good job on its own.
+      recommended; the underlying SequenceMatcher class has an adaptive
+      notion of "noise" lines.
 
-    - charjunk: A function that should accept a string of length 1. The
-      default is module-level function IS_CHARACTER_JUNK, which filters out
-      whitespace characters (a blank or tab; note: bad idea to include newline
-      in this!).
+    - charjunk: A function that accepts a character (string of length
+      1), and returns true iff the character is junk. The default is
+      the module-level function IS_CHARACTER_JUNK, which filters out
+      whitespace characters (a blank or tab; note: it's a bad idea to
+      include newline in this!).
 
     Tools/scripts/ndiff.py is a command-line front-end to this function.
 
@@ -1680,7 +1680,7 @@
         tabsize -- tab stop spacing, defaults to 8.
         wrapcolumn -- column number where lines are broken and wrapped,
             defaults to None where lines are not wrapped.
-        linejunk,charjunk -- keyword arguments passed into ndiff() (used to by
+        linejunk,charjunk -- keyword arguments passed into ndiff() (used by
             HtmlDiff() to generate the side by side HTML differences).  See
             ndiff() documentation for argument default values and descriptions.
         """

-- 
Repository URL: http://hg.python.org/cpython


More information about the Python-checkins mailing list