
[Please keep the discussion in the list. Also, please avoid top posting (corrected below)]
On 6/20/09, Gabriel Genellina <gagsl-py2@yahoo.com.ar> wrote:
En Thu, 18 Jun 2009 11:04:34 -0300, zhong nanhai <higerinbeijing-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> escribió:
So is it a good idea to enhance the filecmp to support universal-newline-mode?If so, we can compare different files from different operation systems and if they have the same content, the filecmp.cmp would return true.
With aid from itertools.izip_longest, it's a one-line recipe:
py> print repr(open("one.txt","rb").read()) 'hello\nworld!\nlast line\n' py> print repr(open("two.txt","rb").read()) 'hello\r\nworld!\r\nlast line\r\n' py> import filecmp py> filecmp.cmp("one.txt", "two.txt", False) False py> from itertools import izip_longest py> f1 = open("one.txt", "rU") py> f2 = open("two.txt", "rU") py> py> print all(line1==line2 for line1,line2 in izip_longest(f1,f2)) True
Currently filecmp considers both files as binary, not text; if they differ in size they're considered different and the contents are not even read.
If you want a generic text-mode file comparison, there are other factors to consider in addition to line endings: character encoding, BOM, character case, whitespace... All of those may be considered "irrelevant differences" by some people. A generic text file comparison should take all of them into account.
--- El vie 19-jun-09, zhong nanhai <higerinbeijing@gmail.com> escribió:
Thanks for you suggestion. You are right and there are a lot of things to consider if we want to make filecmp support text comparision.But I think we can just do some little feature enhancement,e.g. only the universal-newline mode. I am not clear the way filecmp implement the file comparision. So, you can tell me more about that. And if in the source of filecmp, it compare files just by reading them line by line, then we can do some further comparisons when encountering newline flag(means the end of a line).
You can see it yourself, in lib/filecmp.py in your Python installation. It does a binary comparison only -- and it does not read anything if file sizes differ. A text comparison should use a different algorithm; the code above already ignores end-of-line differences and breaks as soon as two lines differ. One could enhance it to add support for other options as menctioned earlier. -- Gabriel Genellina ____________________________________________________________________________________ ¡Viví la mejor experiencia en la web! Descargá gratis el nuevo Internet Explorer 8 http://downloads.yahoo.com/ieak8/?l=ar