difflib confusion

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Wed Jan 23 18:37:43 CET 2008


On 23 ene, 06:59, "krishnakant Mane" <researchb... at gmail.com> wrote:
> On 23/01/2008, Paul Hankin <paul.han... at gmail.com> wrote:> On Jan 22, 6:57 pm, "krishnakant Mane" <researchb... at gmail.com> wrote:
> > > hello all,
> > > I have a bit of a confusing question.
> > > firstly I wanted a library which can do an svn like diff with two files.
> > > let's say I have file1 and file2 where file2 contains some thing which
> > > file1 does not have.  now if I do readlines() on both the files, I
> > > have a list of all the lines.
> > > I now want to do a diff and find out which word is added or deleted or
> > changed.
> > > and that too on which character, if not at least want to know the word
> > > that has the change.
> > > any ideas please?
>
> > Have a look at difflib in the standard library.
>
> I am aware of the difflib library but still can't figure out.
> I know that differences in two lines can be got but how to get it between words?

The base functionality is in SequenceMatcher; this class takes
sequence pairs of any type and tries to match them. The sequences may
be a list of lines, a single line (seen as a list of characters), or
you may feed it with a list of words (perhaps using line.split()).
Built on top of SequenceMatcher, you have a text Differ. It takes a
sequence of lines, and does its work in two steps: first tries to
match blocks of lines (using a SequenceMatcher), and later unmatched
blocks are further analyzed to show intraline differences (with
another SequenceMatcher, considering lines as a sequence of
characters). See the example at http://docs.python.org/lib/differ-examples.html
- perhaps this is what you want.
Note that Differ has no concept of "word"; if you want to report only
whole word differences take a look at the _fancy_replace method.

--
Gabriel Genellina



More information about the Python-list mailing list