[Python-ideas] enhance filecmp to support text-and-universal-newline-mode file comparison
gagsl-py2 at yahoo.com.ar
gagsl-py2 at yahoo.com.ar
Wed Jun 24 17:15:14 CEST 2009
[Please keep the discussion in the list. Also, please avoid top posting (corrected below)]
> On 6/20/09, Gabriel Genellina <gagsl-py2 at yahoo.com.ar>
> > En Thu, 18 Jun 2009 11:04:34 -0300, zhong nanhai
> > <higerinbeijing-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org>
> >> So is it a good idea to enhance the filecmp to
> >> universal-newline-mode?If so, we can compare
> different files from
> >> different operation systems and if they have the
> same content, the
> >> filecmp.cmp would return true.
> > With aid from itertools.izip_longest, it's a one-line
> > py> print repr(open("one.txt","rb").read())
> > 'hello\nworld!\nlast line\n'
> > py> print repr(open("two.txt","rb").read())
> > 'hello\r\nworld!\r\nlast line\r\n'
> > py> import filecmp
> > py> filecmp.cmp("one.txt", "two.txt", False)
> > False
> > py> from itertools import izip_longest
> > py> f1 = open("one.txt", "rU")
> > py> f2 = open("two.txt", "rU")
> > py>
> > py> print all(line1==line2 for line1,line2 in
> > True
> > Currently filecmp considers both files as binary, not
> text; if they differ
> > in size they're considered different and the contents
> are not even read.
> > If you want a generic text-mode file comparison, there
> are other factors
> > to consider in addition to line endings: character
> encoding, BOM,
> > character case, whitespace... All of those may be
> considered "irrelevant
> > differences" by some people. A generic text file
> comparison should take
> > all of them into account.
--- El vie 19-jun-09, zhong nanhai <higerinbeijing at gmail.com> escribió:
> Thanks for you suggestion.
> You are right and there are a lot of things to consider if
> we want to
> make filecmp support text comparision.But I think we can
> just do some
> little feature enhancement,e.g. only the
> universal-newline mode. I am
> not clear the way filecmp implement the file comparision.
> So, you can
> tell me more about that.
> And if in the source of filecmp, it compare files just by
> reading them
> line by line, then we can do some further comparisons when
> encountering newline flag(means the end of a line).
You can see it yourself, in lib/filecmp.py in your Python installation.
It does a binary comparison only -- and it does not read anything if file sizes differ. A text comparison should use a different algorithm; the code above already ignores end-of-line differences and breaks as soon as two lines differ. One could enhance it to add support for other options as menctioned earlier.
¡Viví la mejor experiencia en la web!
Descargá gratis el nuevo Internet Explorer 8
More information about the Python-ideas