Steve Holden steve at holdenweb.com
Thu Aug 9 17:55:20 CEST 2007

Jason wrote:
> On Aug 9, 8:46 am, "special_dragonfly" <Domi... at PLEASEASK.co.uk>
> wrote:
>> <dijkstra.ar... at gmail.com> wrote in message
>>> http://docs.python.org/lib/module-filecmp.html
>> My understanding of reading that is that it only looks at the file names
>> themselves and not their contents. So whether filename1=filename2 and in the
>> case of the function below it, whether one directory has files which are in
>> the other.
>> Correct me if I'm wrong.
>> Dom
>> P.S. md5 or sha hash is what I'd go for, short of doing:
>> MyFirstFile=file("file1.xls")
>> MySecondFile=file("file2.xls")
>> If MyFirstFile==MySecondFile:
>>     print "True"
>> although this won't tell you where they're different, just that they are...
> You're incorrect.  If the shallow flag is not given or is true, the
> results of os.stat are used to compare the two files, so if they have
> the same size, change times, etc, they're considered the same.
> If the shallow flag is given and is false, their contents are
> compared.  In either case, the results are cached for efficiency's
> sake.
>   --Jason
> The documentation for filecmp.cmp is:
>   cmp(  	f1, f2[, shallow])
>       Compare the files named f1 and f2, returning True if they seem
> equal, False otherwise.
>       Unless shallow is given and is false, files with identical
> os.stat() signatures are taken to be equal.
>       Files that were compared using this function will not be
> compared again unless their os.stat() signature changes.
>       Note that no external programs are called from this function,
> giving it portability and efficiency.

This discussion seems to assume that Excel spreadsheets are stored in 
some canonical form so that two spreads with the same functionality are 
always identical on disk to the last bit. I very much doubt this is true 
(consider as an example the file properties that can be set).

So really you need to define "equality". So far the tests discussed have 
concentrated on identifying identical files.

