[Python-checkins] cpython: #14332: provide a better explanation of junk in difflib docs
andrew.kuchling
python-checkins at python.org
Wed Mar 19 21:44:26 CET 2014
http://hg.python.org/cpython/rev/0a69b1e8b7fe
changeset: 89861:0a69b1e8b7fe
user: Andrew Kuchling <amk at amk.ca>
date: Wed Mar 19 16:43:06 2014 -0400
summary:
#14332: provide a better explanation of junk in difflib docs
Initial patch by Alba Magallanes.
files:
Doc/library/difflib.rst | 14 +++++++++++---
Lib/difflib.py | 26 +++++++++++++-------------
2 files changed, 24 insertions(+), 16 deletions(-)
diff --git a/Doc/library/difflib.rst b/Doc/library/difflib.rst
--- a/Doc/library/difflib.rst
+++ b/Doc/library/difflib.rst
@@ -27,7 +27,9 @@
little fancier than, an algorithm published in the late 1980's by Ratcliff and
Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
find the longest contiguous matching subsequence that contains no "junk"
- elements (the Ratcliff and Obershelp algorithm doesn't address junk). The same
+ elements; these "junk" elements are ones that are uninteresting in some
+ sense, such as blank lines or whitespace. (Handling junk is an
+ extension to the Ratcliff and Obershelp algorithm.) The same
idea is then applied recursively to the pieces of the sequences to the left and
to the right of the matching subsequence. This does not yield minimal edit
sequences, but does tend to yield matches that "look right" to people.
@@ -210,7 +212,7 @@
Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
delta (a :term:`generator` generating the delta lines).
- Optional keyword parameters *linejunk* and *charjunk* are for filter functions
+ Optional keyword parameters *linejunk* and *charjunk* are filtering functions
(or ``None``):
*linejunk*: A function that accepts a single string argument, and returns
@@ -224,7 +226,7 @@
*charjunk*: A function that accepts a character (a string of length 1), and
returns if the character is junk, or false if not. The default is module-level
function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
- blank or tab; note: bad idea to include newline in this!).
+ blank or tab; it's a bad idea to include newline in this!).
:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
@@ -624,6 +626,12 @@
length 1), and returns true if the character is junk. The default is ``None``,
meaning that no character is considered junk.
+ These junk-filtering functions speed up matching to find
+ differences and do not cause any differing lines or characters to
+ be ignored. Read the description of the
+ :meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
+ parameter for an explanation.
+
:class:`Differ` objects are used (deltas generated) via a single method:
diff --git a/Lib/difflib.py b/Lib/difflib.py
--- a/Lib/difflib.py
+++ b/Lib/difflib.py
@@ -853,10 +853,9 @@
and return true iff the string is junk. The module-level function
`IS_LINE_JUNK` may be used to filter out lines without visible
characters, except for at most one splat ('#'). It is recommended
- to leave linejunk None; as of Python 2.3, the underlying
- SequenceMatcher class has grown an adaptive notion of "noise" lines
- that's better than any static definition the author has ever been
- able to craft.
+ to leave linejunk None; the underlying SequenceMatcher class has
+ an adaptive notion of "noise" lines that's better than any static
+ definition the author has ever been able to craft.
- `charjunk`: A function that should accept a string of length 1. The
module-level function `IS_CHARACTER_JUNK` may be used to filter out
@@ -1299,17 +1298,18 @@
Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
Optional keyword parameters `linejunk` and `charjunk` are for filter
- functions (or None):
+ functions, or can be None:
- - linejunk: A function that should accept a single string argument, and
+ - linejunk: A function that should accept a single string argument and
return true iff the string is junk. The default is None, and is
- recommended; as of Python 2.3, an adaptive notion of "noise" lines is
- used that does a good job on its own.
+ recommended; the underlying SequenceMatcher class has an adaptive
+ notion of "noise" lines.
- - charjunk: A function that should accept a string of length 1. The
- default is module-level function IS_CHARACTER_JUNK, which filters out
- whitespace characters (a blank or tab; note: bad idea to include newline
- in this!).
+ - charjunk: A function that accepts a character (string of length
+ 1), and returns true iff the character is junk. The default is
+ the module-level function IS_CHARACTER_JUNK, which filters out
+ whitespace characters (a blank or tab; note: it's a bad idea to
+ include newline in this!).
Tools/scripts/ndiff.py is a command-line front-end to this function.
@@ -1680,7 +1680,7 @@
tabsize -- tab stop spacing, defaults to 8.
wrapcolumn -- column number where lines are broken and wrapped,
defaults to None where lines are not wrapped.
- linejunk,charjunk -- keyword arguments passed into ndiff() (used to by
+ linejunk,charjunk -- keyword arguments passed into ndiff() (used by
HtmlDiff() to generate the side by side HTML differences). See
ndiff() documentation for argument default values and descriptions.
"""
--
Repository URL: http://hg.python.org/cpython
More information about the Python-checkins
mailing list