
See http://bugs.python.org/issue10395. os.path.commonpath() should be a function which returns right longest common sub-path for specified paths (os.path.commonprefix() is completely useless for this). There are some open questions about details of *right* behavior. What should be a common prefix of '/var/log/apache2' and '/var//log/mysql'? What should be a common prefix of '/usr' and '//usr'? What should be a common prefix of '/usr/local/' and '/usr/local/'? What should be a common prefix of '/usr/local/' and '/usr/local/bin'? What should be a common prefix of '/usr/bin/..' and '/usr/bin'? Please, those who are interested in this feature, give consistent answers to these questions.

On 6 Nov, 2012, at 16:27, Serhiy Storchaka <storchaka@gmail.com> wrote:
/var/log
What should be a common prefix of '/usr' and '//usr'?
/usr
What should be a common prefix of '/usr/local/' and '/usr/local/'?
/usr/local
What should be a common prefix of '/usr/local/' and '/usr/local/bin'?
/usr/local
What should be a common prefix of '/usr/bin/..' and '/usr/bin'?
/usr/bin In all cases the path is first split into its elements, then calculate the largest common prefix of the two sets of elements, then join the elements back up again. Some cases you don't mention: * Relative paths that don't share a prefix should raise an exception * On windows two paths that don't have the same drive should raise an exception The alternative is to return some arbitrary value (like None) that you have to test for, which would IMHO make it too easy to accidently pass an useless value to some other API and get a confusing exeption later on.
Please, those who are interested in this feature, give consistent answers to these questions.
Ronald

It would be nice if in conjunction with this os.path.commonprefix is renamed as string.commonprefix with the os.path.commonprefix kept for backwards compatibility (and deprecated). more inline On Tue, Nov 6, 2012 at 7:49 AM, Ronald Oussoren <ronaldoussoren@mac.com>wrote:
However, you've left out one key test case: What is commonpath('/usr', '/var')? It seems to me that the only reasonable value is '/'. If you change the semantics so that it either (1) it always always includes a trailing / or (2) it includes a trailing slash if the two paths have it in common, then you don't have the weirdness that in this case it returns a slash and in others it doesn't. I am slightly inclined to (1) at this point. It would also be a bit surprising that there are cases where commonpath(a,a) != a.
seems better than the alternative of interpreting the '..'.
* Relative paths that don't share a prefix should raise an exception
Why? Why is an empty path not a reasonable result?
* On windows two paths that don't have the same drive should raise an exception
I disagree. On unix systems, should two paths that don't have the same drive also raise an exception? What if I'm using this function on windows to compare two http paths or two paths to a remote unix system? Raising an exception in either case would be wrong.
Yes, don't return a useless value. An empty string is useful in the relative path case and '/' is useful in the non-relative but paths don't have common prefix at all case. --- Bruce

Bruce Leban wrote:
But then the common prefix of "/a/b" and "/a/c" would be "/a/", which would be very unexpected -- usually the dirname of a path is not considered to include a trailing slash. The special treatment of the root directory is no weirder than it is anywhere else. It's already special, since in unix it's the only case where a trailing slash is semantically significant. (To the kernel, at least -- a few command line utilities break this rule, but they're screwy.) -- Greg

This seems to be overlapping quite a lot with the recent discussion on object-oriented paths ( http://mail.python.org/pipermail/python-ideas/2012-October/016338.html) and this question of how paths are represented on different systems was discussed quite extensively. I'm not sure where the thread left off, but if PEP 428 is still going ahead then maybe this is something that should be brought into it. David On Wed, Nov 7, 2012 at 7:15 AM, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:

On 7 Nov, 2012, at 3:05, Bruce Leban <bruce@leapyear.org> wrote:
I agree
If you change the semantics so that it either (1) it always always includes a trailing / or (2) it includes a trailing slash if the two paths have it in common, then you don't have the weirdness that in this case it returns a slash and in others it doesn't. I am slightly inclined to (1) at this point.
I'd prefer to only have a path seperator at the end when it has semantic meaning. That would mean that only the root of a filesystem tree ("/" on Unix, but also "C:\" and "\\server\share\" on Windows) have a separator and the end.
It would also be a bit surprising that there are cases where commonpath(a,a) != a.
That's already true, commonpath('/usr//bin', '/usr//bin') would be '/usr/bin' and not '/usr//bin'.
That was the hard choice in the list, my reason for picking this result is that interpreting '..' can change the meaning of a path when dealing with symbolic links and therefore would make the function less useful (and you can always call os.path.normpath when you do want to interpret '..'). Stripping '.' elements would be fine, e.g. commonpath('/usr/./bin/ls', '/usr/bin/sh') could be '/usr/bin'.
* Relative paths that don't share a prefix should raise an exception
Why? Why is an empty path not a reasonable result?
An empty string is not a valid path. Now that I reconsider this question: "." would be a valid path, and would have a sane meaning.
* On windows two paths that don't have the same drive should raise an exception
I disagree. On unix systems, should two paths that don't have the same drive also raise an exception? What if I'm using this function on windows to compare two http paths or two paths to a remote unix system? Raising an exception in either case would be wrong.
The paths in URLs don't have a drive, hence both URL paths would have the "same" drive. More importantly: posixpath.commonpath would be better to compare two http or remote unix paths as that function uses the correct separator (ntpath.commonpath uses a backslash as separator) Also: when two paths have a different drive letter or UNC share name there is no way to have a value for the prefix that allows for the construction of a path from the common prefix to one of those paths. That is, path1 = "c:\windows" path2 = "d:\data" pfx = commonpath(path1, path2) The only value of pfx that would result in there being a value of 'sfx' such that os.path.join(pfx, sfx) == path1 is the empty string, but that value does not refer to a filesystem location. That means you have to explictly test if commonpath returns the empty string because you likely have to behave differently when there is no shared prefix. I'd then prefer if commonpath raises an exception, because it would be too easy to forget to check for this (especially when developing on a unix based platform and later porting to windows). An exception would mean code blows up, instead of giving unexpected results (leading to questions like "Why is your program writing junk in my home directory?")
The alternative is to return some arbitrary value (like None) that you have to test for, which would IMHO make it too easy to accidently pass an useless value to some other API and get a confusing exeption later on.
Yes, don't return a useless value. An empty string is useful in the relative path case and '/' is useful in the non-relative but paths don't have common prefix at all case.
"/" *is* the common prefix for absolute paths on Unix that don't share any path elements. As mentioned above "." (or rather os.path.curdir) would be a sane result for relative paths. Ronald
--- Bruce

On 07.11.12 09:22, Ronald Oussoren wrote:
Yes, the current implementation does not preserve the repeated slashes, this is an argument for the answer that commonpath(['/usr//bin', '/usr/bin']) should return '/usr/bin' and not '/usr'. However it would be a bit surprising that there are cases where commonpath([normpath(a), normpath(a)]) != normpath(a).
Stripping '.' elements would be fine, e.g. commonpath('/usr/./bin/ls', '/usr/bin/sh') could be '/usr/bin'.
May be.
An empty string is not a valid path. Now that I reconsider this question: "." would be a valid path, and would have a sane meaning.
Looks reasonable, but I am not sure. A returned value most probably will be used in join() and this will add an unexpected './' at the start of path.

On 06.11.12 17:49, Ronald Oussoren wrote:
On 6 Nov, 2012, at 16:27, Serhiy Storchaka <storchaka@gmail.com> wrote:
There are some open questions about details of *right* behavior.
I only asked the questions for which there are different opinions or for which I myself doubt.
I think so too.
What should be a common prefix of '/usr' and '//usr'? /usr
normpath() preserves leading double slash (but not triple). That's why I asked the question.
What should be a common prefix of '/usr/local/' and '/usr/local/'? /usr/local
os.path.split('/usr/local/') is ('/usr/local', ''). Repeated application of os.path.split() gives us ('/', 'usr', 'local', ''). That's why I assume that it is possible appropriate here to preserve the trailing slash. I'm not sure.
What should be a common prefix of '/usr/local/' and '/usr/local/bin'? /usr/local
Here the same considerations as for the previous question. In any case a common prefix of '/usr/local/etc' and '/usr/local/bin' should be '/usr/local'.
* Relative paths that don't share a prefix should raise an exception
I disagree. A common prefix for relative paths on the same drive is a current directory on this drive (if we decide to drop '..').
* On windows two paths that don't have the same drive should raise an exception The alternative is to return some arbitrary value (like None) that you have to test for, which would IMHO make it too easy to accidently pass an useless value to some other API and get a confusing exeption later on.
May be. This should be the same result (None or an exception) as for empty list or mixing of absolute and relative paths. Thank you for your answers.

On 6 Nov, 2012, at 16:27, Serhiy Storchaka <storchaka@gmail.com> wrote:
/var/log
What should be a common prefix of '/usr' and '//usr'?
/usr
What should be a common prefix of '/usr/local/' and '/usr/local/'?
/usr/local
What should be a common prefix of '/usr/local/' and '/usr/local/bin'?
/usr/local
What should be a common prefix of '/usr/bin/..' and '/usr/bin'?
/usr/bin In all cases the path is first split into its elements, then calculate the largest common prefix of the two sets of elements, then join the elements back up again. Some cases you don't mention: * Relative paths that don't share a prefix should raise an exception * On windows two paths that don't have the same drive should raise an exception The alternative is to return some arbitrary value (like None) that you have to test for, which would IMHO make it too easy to accidently pass an useless value to some other API and get a confusing exeption later on.
Please, those who are interested in this feature, give consistent answers to these questions.
Ronald

It would be nice if in conjunction with this os.path.commonprefix is renamed as string.commonprefix with the os.path.commonprefix kept for backwards compatibility (and deprecated). more inline On Tue, Nov 6, 2012 at 7:49 AM, Ronald Oussoren <ronaldoussoren@mac.com>wrote:
However, you've left out one key test case: What is commonpath('/usr', '/var')? It seems to me that the only reasonable value is '/'. If you change the semantics so that it either (1) it always always includes a trailing / or (2) it includes a trailing slash if the two paths have it in common, then you don't have the weirdness that in this case it returns a slash and in others it doesn't. I am slightly inclined to (1) at this point. It would also be a bit surprising that there are cases where commonpath(a,a) != a.
seems better than the alternative of interpreting the '..'.
* Relative paths that don't share a prefix should raise an exception
Why? Why is an empty path not a reasonable result?
* On windows two paths that don't have the same drive should raise an exception
I disagree. On unix systems, should two paths that don't have the same drive also raise an exception? What if I'm using this function on windows to compare two http paths or two paths to a remote unix system? Raising an exception in either case would be wrong.
Yes, don't return a useless value. An empty string is useful in the relative path case and '/' is useful in the non-relative but paths don't have common prefix at all case. --- Bruce

Bruce Leban wrote:
But then the common prefix of "/a/b" and "/a/c" would be "/a/", which would be very unexpected -- usually the dirname of a path is not considered to include a trailing slash. The special treatment of the root directory is no weirder than it is anywhere else. It's already special, since in unix it's the only case where a trailing slash is semantically significant. (To the kernel, at least -- a few command line utilities break this rule, but they're screwy.) -- Greg

This seems to be overlapping quite a lot with the recent discussion on object-oriented paths ( http://mail.python.org/pipermail/python-ideas/2012-October/016338.html) and this question of how paths are represented on different systems was discussed quite extensively. I'm not sure where the thread left off, but if PEP 428 is still going ahead then maybe this is something that should be brought into it. David On Wed, Nov 7, 2012 at 7:15 AM, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:

On 7 Nov, 2012, at 3:05, Bruce Leban <bruce@leapyear.org> wrote:
I agree
If you change the semantics so that it either (1) it always always includes a trailing / or (2) it includes a trailing slash if the two paths have it in common, then you don't have the weirdness that in this case it returns a slash and in others it doesn't. I am slightly inclined to (1) at this point.
I'd prefer to only have a path seperator at the end when it has semantic meaning. That would mean that only the root of a filesystem tree ("/" on Unix, but also "C:\" and "\\server\share\" on Windows) have a separator and the end.
It would also be a bit surprising that there are cases where commonpath(a,a) != a.
That's already true, commonpath('/usr//bin', '/usr//bin') would be '/usr/bin' and not '/usr//bin'.
That was the hard choice in the list, my reason for picking this result is that interpreting '..' can change the meaning of a path when dealing with symbolic links and therefore would make the function less useful (and you can always call os.path.normpath when you do want to interpret '..'). Stripping '.' elements would be fine, e.g. commonpath('/usr/./bin/ls', '/usr/bin/sh') could be '/usr/bin'.
* Relative paths that don't share a prefix should raise an exception
Why? Why is an empty path not a reasonable result?
An empty string is not a valid path. Now that I reconsider this question: "." would be a valid path, and would have a sane meaning.
* On windows two paths that don't have the same drive should raise an exception
I disagree. On unix systems, should two paths that don't have the same drive also raise an exception? What if I'm using this function on windows to compare two http paths or two paths to a remote unix system? Raising an exception in either case would be wrong.
The paths in URLs don't have a drive, hence both URL paths would have the "same" drive. More importantly: posixpath.commonpath would be better to compare two http or remote unix paths as that function uses the correct separator (ntpath.commonpath uses a backslash as separator) Also: when two paths have a different drive letter or UNC share name there is no way to have a value for the prefix that allows for the construction of a path from the common prefix to one of those paths. That is, path1 = "c:\windows" path2 = "d:\data" pfx = commonpath(path1, path2) The only value of pfx that would result in there being a value of 'sfx' such that os.path.join(pfx, sfx) == path1 is the empty string, but that value does not refer to a filesystem location. That means you have to explictly test if commonpath returns the empty string because you likely have to behave differently when there is no shared prefix. I'd then prefer if commonpath raises an exception, because it would be too easy to forget to check for this (especially when developing on a unix based platform and later porting to windows). An exception would mean code blows up, instead of giving unexpected results (leading to questions like "Why is your program writing junk in my home directory?")
The alternative is to return some arbitrary value (like None) that you have to test for, which would IMHO make it too easy to accidently pass an useless value to some other API and get a confusing exeption later on.
Yes, don't return a useless value. An empty string is useful in the relative path case and '/' is useful in the non-relative but paths don't have common prefix at all case.
"/" *is* the common prefix for absolute paths on Unix that don't share any path elements. As mentioned above "." (or rather os.path.curdir) would be a sane result for relative paths. Ronald
--- Bruce

On 07.11.12 09:22, Ronald Oussoren wrote:
Yes, the current implementation does not preserve the repeated slashes, this is an argument for the answer that commonpath(['/usr//bin', '/usr/bin']) should return '/usr/bin' and not '/usr'. However it would be a bit surprising that there are cases where commonpath([normpath(a), normpath(a)]) != normpath(a).
Stripping '.' elements would be fine, e.g. commonpath('/usr/./bin/ls', '/usr/bin/sh') could be '/usr/bin'.
May be.
An empty string is not a valid path. Now that I reconsider this question: "." would be a valid path, and would have a sane meaning.
Looks reasonable, but I am not sure. A returned value most probably will be used in join() and this will add an unexpected './' at the start of path.

On 06.11.12 17:49, Ronald Oussoren wrote:
On 6 Nov, 2012, at 16:27, Serhiy Storchaka <storchaka@gmail.com> wrote:
There are some open questions about details of *right* behavior.
I only asked the questions for which there are different opinions or for which I myself doubt.
I think so too.
What should be a common prefix of '/usr' and '//usr'? /usr
normpath() preserves leading double slash (but not triple). That's why I asked the question.
What should be a common prefix of '/usr/local/' and '/usr/local/'? /usr/local
os.path.split('/usr/local/') is ('/usr/local', ''). Repeated application of os.path.split() gives us ('/', 'usr', 'local', ''). That's why I assume that it is possible appropriate here to preserve the trailing slash. I'm not sure.
What should be a common prefix of '/usr/local/' and '/usr/local/bin'? /usr/local
Here the same considerations as for the previous question. In any case a common prefix of '/usr/local/etc' and '/usr/local/bin' should be '/usr/local'.
* Relative paths that don't share a prefix should raise an exception
I disagree. A common prefix for relative paths on the same drive is a current directory on this drive (if we decide to drop '..').
* On windows two paths that don't have the same drive should raise an exception The alternative is to return some arbitrary value (like None) that you have to test for, which would IMHO make it too easy to accidently pass an useless value to some other API and get a confusing exeption later on.
May be. This should be the same result (None or an exception) as for empty list or mixing of absolute and relative paths. Thank you for your answers.
participants (6)
-
Bruce Leban
-
David Townshend
-
Eli Bendersky
-
Greg Ewing
-
Ronald Oussoren
-
Serhiy Storchaka