
The os.path.commonprefix function basically returns the common initial characters (if any) shared by a sequence of strings, e.g. os.path.commonprefix(("Python is great!", "Python is not bad", "Python helps")) # returns "Python " This is purely a string manipulation function which bears no relation to actual file paths or whether they exist. I have found uses for it unrelated to file paths: (a) When updating a progress display by printing backspaces and overtyping, discover the common characters between the old and new values displayed, so as to print as few characters as possible (b) Find the first character position where two strings differ. I am sure there are many others. It seems to me that this function (or something similar) should not be in os.path, but somewhere else where it would be more visible. (There are a lot of "solutions" on the Internet to finding the common prefix of 2 or more than 2 strings, whose authors obviously don't know about os.path.commonprefix.) I'm not sure where, though. Possibilities include (1) a str.commonprefix function. Unfortunately it would have to be used like this str.commonprefix(<sequence of strings>) which would make it [I think] the only str function which couldn't be called as a string method. (2) in the sequence protocol. This would mean that it could apply not just to strings, but to any sequences. E.g. [ (1,2,3), (1,2,4) ].commonprefix() # Would return (1,2). [ (1,2,3), [1,2,4] ].commonprefix() # Mixed sequence types - undefined? One wrinkle: os.patch.commonprefix, if passed an empty sequence, returns an empty STRING. If this function were designed from scratch, it should probably return None. (3) a builtin commonprefix function. Again it could apply to any sequences. I also suspect that this function could be made more efficient. It sorts the sequence. While sorting is very fast (thanks, Uncle Tim!) it seems a bit OTT in this case. Thoughts? Best wishes Rob Cliffe

[Rob Cliffe]
It's certainly in a strange place ;-)
Sure it could. like astr.commonprefix(*strs) to find the longest common prefix among `astr` and the (zero or more) optional arguments.
Don't think so. An empty string is a valid string too, and is the "obviously correct" common prefix of "abc" and "xyz".
It doesn't sort. It finds the min and the max of all the inputs, as separate operations, and finds the longest common prefix of those teo alone. Apart from the initial min/max calls, the other inputs are never looked at again. It's a matter of logical deduction that if S <= L have K initial characters in common, then every string T with S <= T <= L must have the same K initial characters. As a micro-optimization, you might think the min'max could be skipped if there were only two input strings. But then for i, c in enumerate(s1): if c != s2[i]: return s1[:i] could blow up with an IndexError if len(s2) < len(s1). As is, that can't happen, because s1 <= s2 is known. At worst, the `for` loop can run to exhaustion (in which case s2.startswith(s1)). So. to my eye, there are surprisingly few possibilities for speeding this.

This being in `os.path` module suggests that the main intent is to find a common prefix of a `path`. If this is the case, maybe it would be worth instead of: ``` if not isinstance(m[0], (list, tuple)): m = tuple(map(os.fspath, m)) ``` have ``` if not isinstance(m[0], (list, tuple)): m = [os.fspath(el).split(SEP) for el in m] … as now (from tests): ``` commonprefix([b"home:swenson:spam", b"home:swen:spam”]) -> b"home:swen" ``` , which is not a common prefix of a path. If my suggestion above is valid and it was intended to be used on parts of `path`, then it is the right place for it. But it was made too flexible, which makes it suitable for general string problems, but error prone for path problems. The way I see it, ideally there should be: 1. string method 2. sequence method 3. path utility Current `commonprefix` is doing 1 and 2 very well, but 3 is error-prone. Regards, dg

[Rob Cliffe]
It's certainly in a strange place ;-)
Sure it could. like astr.commonprefix(*strs) to find the longest common prefix among `astr` and the (zero or more) optional arguments.
Don't think so. An empty string is a valid string too, and is the "obviously correct" common prefix of "abc" and "xyz".
It doesn't sort. It finds the min and the max of all the inputs, as separate operations, and finds the longest common prefix of those teo alone. Apart from the initial min/max calls, the other inputs are never looked at again. It's a matter of logical deduction that if S <= L have K initial characters in common, then every string T with S <= T <= L must have the same K initial characters. As a micro-optimization, you might think the min'max could be skipped if there were only two input strings. But then for i, c in enumerate(s1): if c != s2[i]: return s1[:i] could blow up with an IndexError if len(s2) < len(s1). As is, that can't happen, because s1 <= s2 is known. At worst, the `for` loop can run to exhaustion (in which case s2.startswith(s1)). So. to my eye, there are surprisingly few possibilities for speeding this.

This being in `os.path` module suggests that the main intent is to find a common prefix of a `path`. If this is the case, maybe it would be worth instead of: ``` if not isinstance(m[0], (list, tuple)): m = tuple(map(os.fspath, m)) ``` have ``` if not isinstance(m[0], (list, tuple)): m = [os.fspath(el).split(SEP) for el in m] … as now (from tests): ``` commonprefix([b"home:swenson:spam", b"home:swen:spam”]) -> b"home:swen" ``` , which is not a common prefix of a path. If my suggestion above is valid and it was intended to be used on parts of `path`, then it is the right place for it. But it was made too flexible, which makes it suitable for general string problems, but error prone for path problems. The way I see it, ideally there should be: 1. string method 2. sequence method 3. path utility Current `commonprefix` is doing 1 and 2 very well, but 3 is error-prone. Regards, dg
participants (3)
-
Dom Grigonis
-
Rob Cliffe
-
Tim Peters