A wart which should have been repaired in 3.0?
The doc for os.path.commonprefix states: Return the longest path prefix (taken character-by-character) that is a prefix of all paths in list. If list is empty, return the empty string (''). Note that this may return invalid paths because it works a character at a time. I remember encountering this in an earlier version of Python 2.x (maybe 2.2 or 2.3?) and "fixed" it to work by pathname components instead of by characters. That had to be reverted because it was a behavior change and broke code which used it for strings which didn't represent paths. After the reversion I then forgot about it. I just stumbled upon it again. It seems to me this would have been a good thing to fix in 3.0. Is this something which could change in 3.1 (or be deprecated in 3.1 with deletion in 3.2)? Skip
skip@pobox.com wrote:
The doc for os.path.commonprefix states:
Return the longest path prefix (taken character-by-character) that is a prefix of all paths in list. If list is empty, return the empty string (''). Note that this may return invalid paths because it works a character at a time.
I remember encountering this in an earlier version of Python 2.x (maybe 2.2 or 2.3?) and "fixed" it to work by pathname components instead of by characters. That had to be reverted because it was a behavior change and broke code which used it for strings which didn't represent paths. After the reversion I then forgot about it.
I just stumbled upon it again. It seems to me this would have been a good thing to fix in 3.0. Is this something which could change in 3.1 (or be deprecated in 3.1 with deletion in 3.2)?
Why can't we add an "allow_fragment" keyword that defaults to True? Then "allow_fragment=False" will stop at the last full directory name and ignore any partial matches on the filenames or the next subdirectory (depending on where the common prefix ends). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick> Why can't we add an "allow_fragment" keyword that defaults to Nick> True? Then "allow_fragment=False" will stop at the last full Nick> directory name and ignore any partial matches on the filenames or Nick> the next subdirectory (depending on where the common prefix ends). You could I suppose though that would just be adding another hack on top of existing questionable behavior. I wasn't so concerned with implementation as whether or not a change to the semantics of the function was possible. Skip
<skip <at> pobox.com> writes:
You could I suppose though that would just be adding another hack on top of existing questionable behavior.
Agreed. We should fix the original function so that it has the obvious, intented effect. Leaving the buggy function in place and adding another function with the proper behaviour sounds ridiculous.
>> You could I suppose though that would just be adding another hack on >> top of existing questionable behavior. Antoine> Agreed. We should fix the original function so that it has the Antoine> obvious, intented effect. Leaving the buggy function in place Antoine> and adding another function with the proper behaviour sounds Antoine> ridiculous. If we add commonpath or commonpathprefix or pathprefix, or whatever, then find someplace to move the existing commonprefix function (maybe to the string module or as a class method of string objects?) then could we make a 2to3 fixer for this? Skip
<skip <at> pobox.com> writes:
If we add commonpath or commonpathprefix or pathprefix, or whatever, then find someplace to move the existing commonprefix function (maybe to the string module or as a class method of string objects?) then could we make a 2to3 fixer for this?
IMHO it's a bug, the py3k migration process needn't apply.
Antoine Pitrou wrote:
<skip <at> pobox.com> writes:
If we add commonpath or commonpathprefix or pathprefix, or whatever, then find someplace to move the existing commonprefix function (maybe to the string module or as a class method of string objects?) then could we make a 2to3 fixer for this?
IMHO it's a bug, the py3k migration process needn't apply.
The current behaviour is exactly what one would need to implement bash-style tab completion [1], so I don't get why anyone is calling it "useless" or "obviously broken". It's brokenness isn't obvious at all to me - it just doesn't do what you want it to do. Adding a separate function called "os.path.commonpath" with the behaviour Skip wants sounds like *exactly* the right answer to me. Cheers, Nick. * entries = os.listdir() candidates = [e for e in entries if e.startswith(typed)] if len(candidates) > 1: tab_result = os.path.commonprefix(entries) elif candidates: tab_result = candidates[0] else: tab_result = typed -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan <ncoghlan <at> gmail.com> writes:
The current behaviour is exactly what one would need to implement bash-style tab completion [1], so I don't get why anyone is calling it "useless" or "obviously broken".
Point taken. Although the fact that it lives in os.path suggests that the function should know about path components instead of ignoring their existence... A generic longest common prefix function would belong elsewhere. The issue people are having with the proposal to create a separate function is that it's a bloat of the API. I don't think the os.path module claims to give utilities for implementing bash-style tab completion, however it is supposed to make manipulation of paths easier -- which returning invalid answers (or, worse, valid but intuitively wrong answers) does not.
Antoine Pitrou wrote:
Nick Coghlan <ncoghlan <at> gmail.com> writes:
The current behaviour is exactly what one would need to implement bash-style tab completion [1], so I don't get why anyone is calling it "useless" or "obviously broken".
Point taken. Although the fact that it lives in os.path suggests that the function should know about path components instead of ignoring their existence... A generic longest common prefix function would belong elsewhere.
The issue people are having with the proposal to create a separate function is that it's a bloat of the API. I don't think the os.path module claims to give utilities for implementing bash-style tab completion, however it is supposed to make manipulation of paths easier -- which returning invalid answers (or, worse, valid but intuitively wrong answers) does not.
True, but it's a matter of weighing up the migration cost of the two options: a) Add a new function (e.g. os.path.commonpath) which works on a path component basis. Zero migration cost, minor ongoing cost in explaining the difference between commonpath (with path component based semantics) and commprefix (with character based semantics). That ongoing cost can largely be handled just by referencing the two functions from each other's documentation (note that they will actually be next to each other in the alphabetical list of os.path functions, and the path component based one will appear before the character based one). b) Deprecate the current semantics of os.path.commonprefix (which will likely involve changing the name anyway, since it is easier to deprecate the old semantics when the new semantics have a different name), add the new path component based semantics, add the character-based semantics back somewhere else. This imposes a major migration cost (since the old commonprefix will at least change its name) with significant potential for confusion due to the semantic changes across versions (if the commonprefix name is reused for the new semantics). If we're going to end up with two functions anyway, why mess with the one which is already there and in use for real programs? Just add a new function with the new semantics and be done with it. Anything else will just cause migration pain without any significant counterbalancing benefit. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Le dimanche 28 décembre 2008 à 16:26 +1000, Nick Coghlan a écrit :
If we're going to end up with two functions anyway, why mess with the one which is already there and in use for real programs?
Well, agreed. I was just hoping we could get away with "fixing" the existing function and voilà :)
Antoine Pitrou wrote:
Le dimanche 28 décembre 2008 à 16:26 +1000, Nick Coghlan a écrit :
If we're going to end up with two functions anyway, why mess with the one which is already there and in use for real programs?
Well, agreed. I was just hoping we could get away with "fixing" the existing function and voilà :)
I'm all for breaking backwards compatibility when it allows some genuine improvements that would otherwise be impossible, but in this particular case a little API bloat seems like the least of the available evils :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
I'm all for breaking backwards compatibility when it allows some genuine improvements that would otherwise be impossible, but in this particular case a little API bloat seems like the least of the available evils :)
I don't think any change is necessary. os.path.commonprefix works just fine on path components: py> p = ["/usr/bin/ls", "/usr/bin/ln"] py> os.path.commonprefix([f.split('/') for f in p]) ['', 'usr', 'bin'] py> p.append("/usr/local/bin/ls") py> os.path.commonprefix([f.split('/') for f in p]) ['', 'usr'] Of course, using it that way would require a library function that reliably splits a path into components; I think one would have to do abspath on arbitrary inputs. Regards, Martin
Martin> I don't think any change is necessary. os.path.commonprefix Martin> works just fine on path components: ... Ummm... >>> os.path.commonprefix(["/export/home", "/etc/passwd"]) '/e' I suppose that's correct given the defined behavior of the function, but it certainly doesn't seem to be very path-like to me. Martin> Of course, using it that way would require a library function Martin> that reliably splits a path into components; I think one would Martin> have to do abspath on arbitrary inputs. See <http://bugs.python.org/issue4755> for what I think is a function with more predictable behavior given that we are discussing paths and not just strings. Skip
"skip" == skip <skip@pobox.com> writes:
Martin> I don't think any change is necessary. os.path.commonprefix Martin> works just fine on path components: skip> Ummm... >>> os.path.commonprefix(["/export/home", "/etc/passwd"]) '/e' skip> I suppose that's correct given the defined behavior of the skip> function, but it certainly doesn't seem to be very path-like to skip> me. I should also point out that most people will not have the foresight to use it the way Martin demonstrated. Documentation or not, I'll be a fair fraction of all usage assumes the return value represents a valid path. Martin> Of course, using it that way would require a library function Martin> that reliably splits a path into components; I think one would Martin> have to do abspath on arbitrary inputs. Kinda what I think os.path.split ought to do. Should I tackle that next? ;-) Skip
Martin> I don't think any change is necessary. os.path.commonprefix Martin> works just fine on path components: ...
Ummm...
>>> os.path.commonprefix(["/export/home", "/etc/passwd"]) '/e'
This calls it with strings, not with path components. As I said, it works fine for path components: py> os.path.commonprefix([f.split('/') for f in ["/export/home", "/etc/passwd"]]) ['']
See <http://bugs.python.org/issue4755> for what I think is a function with more predictable behavior given that we are discussing paths and not just strings.
See above: the function works for lists as well. Regards, Martin
>> See <http://bugs.python.org/issue4755> for what I think is a function >> with more predictable behavior given that we are discussing paths and >> not just strings. Martin> See above: the function works for lists as well. But as you yourself pointed out, Python lacks a reliable split function for filesystem paths. The patch implements different versions for Windows and other platforms because Python supports two separators on that platform. Skip
I think Nick's solution is "Don't let the best be the enemy of the good" Had this been caught before 3.0 release it might be a different solution Let's just add a new function that works "correctly" Martin, it seems to me that a path. method shouldn't require me to pass path components but instead should accept a "path" as its input (or in this case multiple paths). The current usage feels like a string method to me. Not saying it's not useful but it isn't "intuitive". For those that prefer not to add functions all willy-nilly, would it not be better to add a "delimiter" keyword that defaults to False? Then "delimiter=False" will function with the current functionality unchanged while os.path.commonprefix(["bob/export/home", "bob/etc/passwd"], delimiter = "/") would properly return 'bob/'
Jeff Hall wrote:
... For those that prefer not to add functions all willy-nilly, would it not be better to add a "delimiter" keyword that defaults to False? Then "delimiter=False" will function with the current functionality unchanged while
os.path.commonprefix(["bob/export/home", "bob/etc/passwd"], delimiter = "/")
The proper call should be: os.path.commonprefix(["bob/example", "bob/etc/passwd"], delimiter=True) and output: 'bob' (path to the common directory) Perhaps even call the keyword arg "delimited," rather than "delimiter." On Windows, I'd like to see: os.path.commonprefix(['a/b/c.d/e'f', r'a\b\c.d\eve'], delimited=True) return either 'a/b/c.d' or r'a\b\c.d' Perhaps even ['a', 'b', 'c.d'] (suitable for os.path.join). --Scott David Daniels Scott.Daniels@Acm.Org
I was thinking that the user could just define the delimiter character due to the differences amongst delimiters used in OS's... but if that isn't a problem (Skip seemed to think it wouldn't be) then my solution is functionally identical to the first one he proposed
Jeff> For those that prefer not to add functions all willy-nilly, would Jeff> it not be better to add a "delimiter" keyword that defaults to Jeff> False? Then "delimiter=False" will function with the current Jeff> functionality unchanged while Jeff> os.path.commonprefix(["bob/export/home", "bob/etc/passwd"], delimiter = "/") Jeff> would properly return Jeff> 'bob/' On Windows what would you do with this crazy, but valid, path? c:/etc\\passwd I don't do Windows, so don't have any idea if there is even an /etc/passwd file on Windows. I'd guess not, but that's not the point. The point is that you can use both / (aka ntpath.sep) and \ (aka ntpath.altsep) in Windows pathnames. See my patch (issue 4755) for a version of os.path.<whatever> which works as at least I expect and should work cross-platform. Skip
You know, all this path separator and list complication isn't really necessary, when you can just take the os.path.dirname() of the return from commonprefix(). Perhaps we could just add that recommendation to the docs? At 04:46 PM 12/29/2008 -0600, skip@pobox.com wrote:
Jeff> For those that prefer not to add functions all willy-nilly, would Jeff> it not be better to add a "delimiter" keyword that defaults to Jeff> False? Then "delimiter=False" will function with the current Jeff> functionality unchanged while
Jeff> os.path.commonprefix(["bob/export/home", "bob/etc/passwd"], delimiter = "/")
Jeff> would properly return
Jeff> 'bob/'
On Windows what would you do with this crazy, but valid, path?
c:/etc\\passwd
I don't do Windows, so don't have any idea if there is even an /etc/passwd file on Windows. I'd guess not, but that's not the point. The point is that you can use both / (aka ntpath.sep) and \ (aka ntpath.altsep) in Windows pathnames. See my patch (issue 4755) for a version of os.path.<whatever> which works as at least I expect and should work cross-platform.
Skip
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com
2008/12/30 Phillip J. Eby <pje@telecommunity.com>:
You know, all this path separator and list complication isn't really necessary, when you can just take the os.path.dirname() of the return from commonprefix().
Perhaps we could just add that recommendation to the docs?
Actually, consider the following (on Windows):
python Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import os os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"]) 'foo'
This very clearly shows that commonprefix is a string operation rather than a path operation, as it does not respect the equivalence of os.sep and os.altsep. In path semantics, the common prefix is "foo/bar" (or equivalently "foo\\bar"). I'm not sure how to deal with this, except by recommending that all paths passed to os.path.commonprefix should at the very least be normalised via os.path.normpath first - which starts to get clumsy fast. So the "recommended" usage to get the common directory is paths = [...] common = os.path.dirname(os.path.commonprefix([os.path.normpath(p) for p in paths])) Hmm... Paul.
Paul demonstrates the shortcoming of commonprefix: >>> os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"]) 'foo' With the patch in issue4755: >>> import ntpath >>> ntpath.commonpathprefix(["foo\\bar\\baz", "foo/bar/boink"]) 'foo\\bar' Ta da ... Skip
At 06:14 AM 12/30/2008 -0600, skip@pobox.com wrote:
Paul demonstrates the shortcoming of commonprefix:
>>> os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"]) 'foo'
With the patch in issue4755:
>>> import ntpath >>> ntpath.commonpathprefix(["foo\\bar\\baz", "foo/bar/boink"]) 'foo\\bar'
But it doesn't handle the fact that Windows paths are case-insensitive, or that Posix paths can have symlinks... or that one path might be relative and another absolute... As soon as you move away from being a string operation, you get an endless series of gotchas... none of which are currently documented.
Phillip> But it doesn't handle the fact that Windows paths are Phillip> case-insensitive, or that Posix paths can have symlinks... or Phillip> that one path might be relative and another absolute... Phillip> As soon as you move away from being a string operation, you get Phillip> an endless series of gotchas... none of which are currently Phillip> documented. Well, then we can document (some of?) the gotchas* and work on a better implementation of commonpathprefix. I don't do Windows. You're lucky I got as far as I did with the Windows side of things. ;-) Skip * I would argue that symlinks should be transparent. By the very nature of the operations and the fact that they might be performed on other platforms (import posixpath on Windows for instance) there is not much, if anything, you can infer about the paths themselves other than their structure.
Paul Moore wrote:
2008/12/30 Phillip J. Eby <pje@telecommunity.com>:
You know, all this path separator and list complication isn't really necessary, when you can just take the os.path.dirname() of the return from commonprefix()....
Actually, consider: ...
os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"]) 'foo'
... I'm not sure how to deal with this, except by recommending that all paths passed to os.path.commonprefix should at the very least be normalised via os.path.normpath first - which starts to get clumsy fast. So the "recommended" usage to get the common directory is
paths = [...] common = os.path.dirname(os.path.commonprefix([ os.path.normpath(p) for p in paths]))
More trouble with the "just take the dirname": paths = ['/a/b/c', '/a/b/d', '/a/b'] os.path.dirname(os.path.commonprefix([ os.path.normpath(p) for p in paths])) give '/a', not '/a/b'. --Scott David Daniels Scott.Daniels@Acm.Org
On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote:
At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote:
More trouble with the "just take the dirname":
paths = ['/a/b/c', '/a/b/d', '/a/b'] os.path.dirname(os.path.commonprefix([ os.path.normpath(p) for p in paths]))
give '/a', not '/a/b'.
...because that's the correct answer.
But not the answer that is wanted. So the challenge now is to write a single expression that will yield '/a/b' when passed the above paths list, and also produce '/a/b' when passed the following paths list: paths = ['/a/b/c', '/a/b/cd'] --RDM
On Tue, 30 Dec 2008 at 21:30, rdmurray@bitdance.com wrote:
On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote:
At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote:
More trouble with the "just take the dirname":
paths = ['/a/b/c', '/a/b/d', '/a/b'] os.path.dirname(os.path.commonprefix([ os.path.normpath(p) for p in paths]))
give '/a', not '/a/b'.
...because that's the correct answer.
But not the answer that is wanted.
So the challenge now is to write a single expression that will yield '/a/b' when passed the above paths list, and also produce '/a/b' when passed the following paths list:
paths = ['/a/b/c', '/a/b/cd']
Sorry, now I see what you are saying: that in '/a/b' the 'b' is the filename. Clearly that wasn't what I intuitively expected our notional 'commonpathprefix' command to produce, for whatever that is worth :) --RDM
At 09:30 PM 12/30/2008 -0500, rdmurray@bitdance.com wrote:
On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote:
At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote:
More trouble with the "just take the dirname":
paths = ['/a/b/c', '/a/b/d', '/a/b'] os.path.dirname(os.path.commonprefix([ os.path.normpath(p) for p in paths])) give '/a', not '/a/b'.
...because that's the correct answer.
But not the answer that is wanted.
So the challenge now is to write a single expression that will yield '/a/b' when passed the above paths list, and also produce '/a/b' when passed the following paths list:
paths = ['/a/b/c', '/a/b/cd']
Change that to [os.path.normpath(p)+'/' for p in paths] and you've got yourself a winner.
2008/12/31 Phillip J. Eby <pje@telecommunity.com>:
Change that to [os.path.normpath(p)+'/' for p in paths] and you've got yourself a winner.
s#'/'#os.sep# to make it work on Windows as well :-) Have we established yet that this is hard enough to get right to warrant a stdlib implementation? Paul
Phillip J. Eby wrote:
At 09:30 PM 12/30/2008 -0500, rdmurray@bitdance.com wrote:
On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote:
At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote:
More trouble with the "just take the dirname":
paths = ['/a/b/c', '/a/b/d', '/a/b'] os.path.dirname(os.path.commonprefix([ os.path.normpath(p) for p in paths])) give '/a', not '/a/b'.
...because that's the correct answer.
But not the answer that is wanted.
So the challenge now is to write a single expression that will yield '/a/b' when passed the above paths list, and also produce '/a/b' when passed the following paths list:
paths = ['/a/b/c', '/a/b/cd']
Change that to [os.path.normpath(p)+'/' for p in paths] and you've got yourself a winner.
Or possibly [os.path.normpath(p)+os.path.sep for p in paths]? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/
Phillip> At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote: >> More trouble with the "just take the dirname": >> >> paths = ['/a/b/c', '/a/b/d', '/a/b'] >> os.path.dirname(os.path.commonprefix([ >> os.path.normpath(p) for p in paths])) >> >> give '/a', not '/a/b'. Phillip> ...because that's the correct answer. I don't understand. If you search for os.path.commonprefix at codesearch.google.com you'll find uses like this: if os.path.commonprefix([basedir, somepath]) != basedir: ... which leads me to believe that other people using the current function in the real world would be confused by your interpretation. Skip
At 08:57 PM 12/30/2008 -0600, skip@pobox.com wrote:
Phillip> At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote: >> More trouble with the "just take the dirname": >> >> paths = ['/a/b/c', '/a/b/d', '/a/b'] >> os.path.dirname(os.path.commonprefix([ >> os.path.normpath(p) for p in paths])) >> >> give '/a', not '/a/b'.
Phillip> ...because that's the correct answer.
I don't understand. If you search for os.path.commonprefix at codesearch.google.com you'll find uses like this:
if os.path.commonprefix([basedir, somepath]) != basedir: ...
which leads me to believe that other people using the current function in the real world would be confused by your interpretation.
It never would've occurred to me to use it for that, versus checking for somepath.startswith(basedir+sep). The only thing I've ever used commonprefix for is to find the most-specific directory that contains all the specified paths. Never occurred to me that there was any other use for it, actually.
I propose a different solution to this commonprefix mess: eliminate the function altogether. It is apparently too confusing to get right. Regards, Martin
On Sat, 27 Dec 2008 10:58:07 am Nick Coghlan wrote:
skip@pobox.com wrote:
The doc for os.path.commonprefix states:
Return the longest path prefix (taken character-by-character) that is a prefix of all paths in list. If list is empty, return the empty string (''). Note that this may return invalid paths because it works a character at a time.
I remember encountering this in an earlier version of Python 2.x (maybe 2.2 or 2.3?) and "fixed" it to work by pathname components instead of by characters. That had to be reverted because it was a behavior change and broke code which used it for strings which didn't represent paths. After the reversion I then forgot about it.
I just stumbled upon it again. It seems to me this would have been a good thing to fix in 3.0. Is this something which could change in 3.1 (or be deprecated in 3.1 with deletion in 3.2)?
Why can't we add an "allow_fragment" keyword that defaults to True? Then "allow_fragment=False" will stop at the last full directory name and ignore any partial matches on the filenames or the next subdirectory (depending on where the common prefix ends).
For what it's worth, I think that the two pieces of functionality are different enough that in an ideal world they should be two different functions rather than one function with a switch. I think os.path.commonprefix should only operate on path components, and if character-by-character prefix matching on general strings is useful, then it should be a string method. -- Steven D'Aprano
skip> I just stumbled upon it again. It seems to me this would have skip> been a good thing to fix in 3.0. Is this something which could skip> change in 3.1 (or be deprecated in 3.1 with deletion in 3.2)? Hmmm... I didn't really mean "deletion". I meant, could a behavior change be implemented in 3.2 with a warning emitted in 3.1? Skip
skip> I just stumbled upon it again. It seems to me this would have skip> been a good thing to fix in 3.0. Is this something which could skip> change in 3.1 (or be deprecated in 3.1 with deletion in 3.2)? This new issue in the tracker: http://bugs.python.org/issue4755 implements a commonpathprefix function. As explained in the submission this would be my second choice should it be decided that something should change. Skip
participants (12)
-
"Martin v. Löwis"
-
Alexander Belopolsky
-
Antoine Pitrou
-
Jeff Hall
-
Nick Coghlan
-
Paul Moore
-
Phillip J. Eby
-
rdmurray@bitdance.com
-
Scott David Daniels
-
skip@pobox.com
-
Steve Holden
-
Steven D'Aprano