[issue2057] difflib: add patch capability

anatoly techtonik report at bugs.python.org
Tue Sep 21 00:06:07 CEST 2010


anatoly techtonik <techtonik at gmail.com> added the comment:

On Mon, Sep 20, 2010 at 11:58 PM, Terry J. Reedy <report at bugs.python.org> wrote:
> Given that difflib produces unified diffs (among 3 others) and that diff.py is a thin command-line wrapper that provides access to all 4 formats (with no default), I consider those two files 'ready'.

Context diff is default format for diff.py utility. I tried my best to
change this, but failed, and that's very disappointing in Python
community. You may take a look how much time was wasted on a decision
for a file that is neither part of standard library nor really used by
anyone participated except me. http://bugs.python.org/issue8355

> So I presume you are referring to your patch.py, which is still labelled experimental.
>
> Difflib works by analyzing sequences a and b with SequenceMatcher to produce a sequence of edits that would produce b from a. It then formats the edits into 1 of 4 formats.
>
> Ideally, a patchlib would have a core SequenceEditor that would apply a sequence of edits (in the same format as SequenceMatcher's outputs) to sequence a to output sequence b. That much seems relatively easy. To be complete, it should also have at least 3 if not 4 parse functions that would produce edit sequences. A corresponding patch.py would then be a thin command-line wrapper over patchlib.

Difflib doesn't produce correct output for unified format, and
patch.py is not able to parse it correctly - see issue2142, issue7585
(for Python 2.6). patch.py doesn't operate with sequences of edits -
it has line by line parser (rather than symbol by symbol), which
produces list of filenames with info about changed lines in hunk
objects (containing line numbers). There is no SequenceEditor
sequences.

> Your comments and a perusal of your code indicates that it has a unified diff parser, a sequence matcher, and command-line wrapper. I guess the immediate question is whether this would be enough for a start.

Unified diff parser is valid only for Subversion style patches.
Patches produced by difflib are not valid as stated above. Mercurial
and git style patches are about to be added later with an ability to
create and copy/move files.

Sequence matcher is line comparison tool with some intelligence to
free developer from CRLF difference sufferings. It is good for "apply
patch" use case, but I do not know how about low-level SequenceEditor
API. Usually patch utilities contains more additional logic to apply
patches with some offset.

http://code.google.com/p/google-diff-match-patch/ will be a better
point for low level difflib API.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2057>
_______________________________________


More information about the Python-bugs-list mailing list