filecmp.cmp() cache

Thu Feb 15 11:56:08 EST 2007

Mattias Brändström wrote:

> I have a question about filecmp.cmp(). The short code snippet blow
> does not bahave as I would expect:
> 
> import filecmp
> 
> f0 = "foo.dat"
> f1 = "bar.dat"
> 
> f = open(f0, "w")
> f.write("1:2")
> f.close()
> 
> f = open(f1, "w")
> f.write("1:2")
> f.close()
> 
> print "cmp 1: " + str(filecmp.cmp(f0, f1, False))
> 
> f = open(f1, "w")
> f.write("2:3")
> f.close()
> 
> print "cmp 2: " + str(filecmp.cmp(f0, f1, False))
> 
> I would expect the second comparison to return False instead of True.
> Looking at the docs for filecmp.cmp() I found the following: "This
> function uses a cache for past comparisons and the results, with a
> cache invalidation mechanism relying on stale signatures.". I guess
> that this is the reason for my test case failing.
> 
> Is there someone here that can tell me how I should invalidate this
> cache? If that is not possible, what workaround could I use? I guess
> that I can write my own file comparison function, but I would not like
> to have to do that since we have filecmp.
> 
> Any ideas?

You can clear the cache with

filecmp._cache = {}

as a glance into the filecmp module would have shown.
If you don't want to use the cache at all (untested):

class NoCache:
    def __setitem__(self, key, value):
        pass
    def get(self, key):
        return None
filecmp._cache = NoCache()

Alternatively an update to Python 2.5 might work as the type of 
os.stat(filename).st_mtime was changed from int to float and now offers
subsecond resolution.

Peter