[New-bugs-announce] [issue41354] filecmp.cmp documentation does not match actual code

Christof Hanke report at bugs.python.org
Tue Jul 21 03:29:03 EDT 2020


New submission from Christof Hanke <christof.hanke at mpcdf.mpg.de>:

help(filecmp.cmp) says:

"""
cmp(f1, f2, shallow=True)
    Compare two files.
    
    Arguments:
    
    f1 -- First file name
    
    f2 -- Second file name
    
    shallow -- Just check stat signature (do not read the files).
               defaults to True.
    
    Return value:
    
    True if the files are the same, False otherwise.
    
    This function uses a cache for past comparisons and the results,
    with cache entries invalidated if their stat information
    changes.  The cache may be cleared by calling clear_cache().
"""

However, looking at the code, the shallow-argument is taken only into account if the signatures are the same:
"""
    s1 = _sig(os.stat(f1))
    s2 = _sig(os.stat(f2))
    if s1[0] != stat.S_IFREG or s2[0] != stat.S_IFREG:
        return False
    if shallow and s1 == s2:
        return True
    if s1[1] != s2[1]:
        return False

    outcome = _cache.get((f1, f2, s1, s2))
    if outcome is None:
        outcome = _do_cmp(f1, f2)
        if len(_cache) > 100:      # limit the maximum size of the cache
            clear_cache()
        _cache[f1, f2, s1, s2] = outcome
    return outcome
"""

Therefore, if I call cmp with shallow=True and the stat-signatures differ, 
cmp actually does a "deep" compare.
This "deep" compare however does not check the stat-signatures.

Thus I propose follwing patch:
cmp always checks the "full" signature.
return True if shallow and above test passed.
It does not make sense to me that when doing a "deep" compare, that only the size 
is compared, but not the mtime. 


--- filecmp.py.orig     2020-07-16 12:00:57.000000000 +0200
+++ filecmp.py  2020-07-16 12:00:30.000000000 +0200
@@ -52,10 +52,10 @@
     s2 = _sig(os.stat(f2))
     if s1[0] != stat.S_IFREG or s2[0] != stat.S_IFREG:
         return False
-    if shallow and s1 == s2:
-        return True
-    if s1[1] != s2[1]:
+    if s1 != s2:
         return False
+    if shallow:
+        return True
 
     outcome = _cache.get((f1, f2, s1, s2))
     if outcome is None:

----------
components: Library (Lib)
messages: 374054
nosy: chanke
priority: normal
severity: normal
status: open
title: filecmp.cmp documentation does not match actual code
type: behavior

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41354>
_______________________________________


More information about the New-bugs-announce mailing list