How should I compare two txt files separately coming from windows/dos and linux/unix

Piet van Oostrum piet at cs.uu.nl
Fri Jun 12 16:24:47 EDT 2009


>>>>> higer <higerinbeijing at gmail.com> (h) wrote:

>h> On Jun 11, 11:44 am, Chris Rebert <c... at rebertia.com> wrote:
>>> On Wed, Jun 10, 2009 at 8:11 PM, higer<higerinbeij... at gmail.com> wrote:
>>> > I just want to compare two files,one from windows and the other from
>>> > unix. But I do not want to compare them through reading them line by
>>> > line. Then I found there is a filecmp module which is used as file and
>>> > directory comparisons. However,when I use two same files (one from
>>> > unix,one from windows,the content of them is the same) to test its cmp
>>> > function, filecmp.cmp told me false.
>>> 
>>> > Later, I found that windows use '\n\r' as new line flag but unix use
>>> > '\n', so filecmp.cmp think that they are different,then return false.
>>> > So, can anyone tell me that is there any method like IgnoreNewline
>>> > which can ignore the difference of new line flag in diffrent
>>> > platforms? If not,I think filecmp may be not a good file comparison
>>> 
>>> Nope, there's no such flag. You could run the files through either
>>> `dos2unix` or `unix2dos` beforehand though, which would solve the
>>> problem.
>>> Or you could write the trivial line comparison code yourself and just
>>> make sure to open the files in Universal Newline mode (add 'U' to the
>>> `mode` argument to `open()`).
>>> You could also file a bug (a patch to add newline insensitivity would
>>> probably be welcome).
>>> 
>>> Cheers,
>>> Chris
>>> --http://blog.rebertia.com

>h> Thank you very much. Adding 'U' argument can perfectly work, and I
>h> think it is definitely to report this as a bug to Python.org as you
>h> say.

Filecmp does a binary compare, not a text compare. So it starts by
comparing the sizes of the files and if they are different the files
must be different. If equal it compares the bytes by reading large
blocks. Comparing text files would be quite different especially when
ignoring line separators. Maybe comparing text files should be added as
a new feature.
-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org



More information about the Python-list mailing list