os.path.commonprefix inadequacies

Dean & Yang Draayer draayer at surfglobal.net
Tue Jul 31 17:20:44 EDT 2001


The os.path.commonprefix function at first glance seems like a useful
function: to find a common ancestor of a bunch of paths. However, it is
inadequate on several counts, most notably that it simply doesn't work 
-
at least not in the way one would expect (or blithely assume!). Some
observations, in increasing order of importance:

(1) The implementation is needlessly inefficient.

(2) It's really an operation on STRINGS, not paths. This functionality
belongs in the string module (if anywhere), not in os.path.

(3) Actually, it almost works for any 'sequence of sequences' - which
would be useful - and could easily be generalized in this direction. 
The
only reason it's string-specific is in the returning of ''. The only
difficulty in generalizing is deciding what to return instead.

(4) THE MOST IMPORTANT SHORTCOMING: this function really has nothing to
do with finding a common ancestor in a file/directory heirarchy - see
(2) above. Too bad, since that's what one would expect it to do, and
it's functionality one would certainly like to have available. Alas, it
does NOT behave as described in its docstring - very misleading.


I would like to see this addressed. Here's my two cents:

(1) Modify the existing os.path.commonprefix, and deprecate its use (at
least for finding common ancestors). Perhaps the following definition:

    def commonprefix(L, empty=''):
        if not L:
            return empty
        m = min(map(len, L))
        if m == 0:
            return empty
        L0 = L[0]
        for i in xrange(1, len(L)):
            for j in xrange(m):
                if L[i][j] != L0[j]:
                    m = j
                    if m == 0:
                        return empty
                    break
        return L0[:m]

The extra argument allows it to work on any sequence of sequence-like
items - e.g. see (2) below. The default value '' should maintain
backward compatibility.


(2) Define an os.path.commonpath that behaves as desired. It would have
to split apart the paths completely and look for common initial runs of
individual path components. Maybe like;

    def commonpath(thePaths):
        # thePaths is a list of paths (strings)
        thePaths = map(lambda p: p.split(os.sep), thePaths)
        parent = commonprefix(thePaths, [])
        parent = os.sep.join(parent)
        return parent

Note: (a) This assumes that the paths are either all absolute or all
relative - the user is responsible for ensuring this. (b) It probably
needs to be modified to handle the drive portion correctly.


(3) Perhaps even more convenient would be a function that returns not
just the common path prefix but also a list of the paths converted to
relative paths - relative to this common path. For example:

    def splitcommonpath(thePaths):
        # thePaths is a list of paths (strings)
        thePaths = map(lambda p: p.split(os.sep), thePaths)
        # chop common part off the paths
        theBase = commonprefix(thePaths, [])
        thePaths = map(lambda p, c=len(theBase): p[c:], thePaths)
        # convert back to strings
        theBase = os.sep.join(theBase)
        thePaths = map(os.sep.join, thePaths)
        return (theBase, thePaths)


Well, that outlines what I would like to see.  Anyone with thoughts or
suggestions on this matter?


Dean Draayer





More information about the Python-list mailing list