Re: [Python-ideas] enhance filecmp to support text-and-universal-newline-mode file comparison

June 24, 2009


      [Please keep the discussion in the list. Also, please avoid top posting (corrected below)]
...
On 6/20/09, Gabriel Genellina <gagsl-py2@yahoo.com.ar>
wrote:
...
En Thu, 18 Jun 2009 11:04:34 -0300, zhong nanhai
<higerinbeijing-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
escribió:
...
So is it a good idea to enhance the filecmp to
support
universal-newline-mode?If so, we can compare
different files from
different operation systems and if they have the
same content, the
filecmp.cmp would return true.
With aid from itertools.izip_longest, it's a one-line
recipe:
py> print repr(open("one.txt","rb").read())
'hello\nworld!\nlast line\n'
py> print repr(open("two.txt","rb").read())
'hello\r\nworld!\r\nlast line\r\n'
py> import filecmp
py> filecmp.cmp("one.txt", "two.txt", False)
False
py> from itertools import izip_longest
py> f1 = open("one.txt", "rU")
py> f2 = open("two.txt", "rU")
py>
py> print all(line1==line2 for line1,line2 in
izip_longest(f1,f2))
True
Currently filecmp considers both files as binary, not
text; if they differ
in size they're considered different and the contents
are not even read.
If you want a generic text-mode file comparison, there
are other factors
to consider in addition to line endings: character
encoding, BOM,
character case, whitespace... All of those may be
considered "irrelevant
differences" by some people. A generic text file
comparison should take
all of them into account.
--- El vie 19-jun-09, zhong nanhai <higerinbeijing@gmail.com> escribió:
...
Thanks for you suggestion.
You are right and there are a lot of things to consider if
we want to
make filecmp support text comparision.But I think we can
just do some
little feature enhancement,e.g. only  the
universal-newline mode. I am
not clear the way filecmp implement the file comparision.
So, you can
tell me more about that.
And if in the source of filecmp, it compare files just by
reading them
line by line, then we can do some further comparisons when
encountering newline flag(means the end of a line).
You can see it yourself, in lib/filecmp.py in your Python installation.
It does a binary comparison only -- and it does not read anything if file sizes differ. A text comparison should use a different algorithm; the code above already ignores end-of-line differences and breaks as soon as two lines differ. One could enhance it to add support for other options as menctioned earlier.


-- 
Gabriel Genellina


      ____________________________________________________________________________________
¡Viví la mejor experiencia en la web!
Descargá gratis el nuevo Internet Explorer 8
http://downloads.yahoo.com/ieak8/?l=ar

Re: [Python-ideas] enhance filecmp to support text-and-universal-newline-mode file comparison

gagsl-py2＠yahoo.com.ar