[Python-ideas] Re: commonprefix

June 13, 2024

      [Rob Cliffe]
...
The os.path.commonprefix function basically returns the common initial
characters (if any) shared by a sequence of strings, e.g.
 ...
It seems to me that this function (or something similar) should not be
in os.path, but somewhere else where it would be more visible.
It's certainly in a strange place ;-)
...
(1) a str.commonprefix function.  Unfortunately it would have to be
used like this
             str.commonprefix(<sequence of strings>)
         which would make it [I think] the only str function which
couldn't be called as a string method.
Sure it could. like

    astr.commonprefix(*strs)

to find the longest common prefix among `astr` and the (zero or more)
optional arguments.
...
...
           One wrinkle: os.patch.commonprefix, if passed an empty
sequence, returns an empty STRING.
               If this function were designed from scratch, it should
probably return None.
Don't think so. An empty string is a valid string too, and is the
"obviously correct" common prefix of "abc" and "xyz".
...
I also suspect that this function could be made more efficient.  It
sorts the sequence.  While sorting is very fast (thanks, Uncle Tim!) it
seems a bit OTT in this case.
It doesn't sort. It finds the min and the max of all the inputs, as
separate operations, and finds the longest common prefix of those teo
alone. Apart from the initial min/max calls, the other inputs are never
looked at again. It's a matter of logical deduction that if S <= L have K
initial characters in common, then every string T with S <= T <= L must
have the same K initial characters.

As a micro-optimization, you might think the min'max could be skipped if
there were only two input strings. But then

    for i, c in enumerate(s1):
        if c != s2[i]:
            return s1[:i]

could blow up with an IndexError if len(s2) < len(s1). As is, that can't
happen, because s1 <= s2 is known. At worst, the `for` loop can run to
exhaustion (in which case s2.startswith(s1)).

So. to my eye, there are surprisingly few possibilities for speeding this.

[Python-ideas] Re: commonprefix

Tim Peters