[Python-ideas] os.path.commonprefix: Yes that old chestnut.
Andrew Barnert
abarnert at yahoo.com
Tue Mar 24 14:44:16 CET 2015
On Mar 24, 2015, at 3:05 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>
>> On 24 March 2015 at 01:54, <random832 at fastmail.us> wrote:
>>> On Mon, Mar 23, 2015, at 18:48, Paul Moore wrote:
>>> The type of the return value should
>>> be a concrete type - probably type(p1)?
>>
>> I'd argue it should be the common supertype, so a PurePosixPath if both
>> are posix and one is pure, a Path or PurePath if one is windows and the
>> other is posix.
>
> That's not really possible in the face of the possibility that an
> argument could be a user-defined class, possibly not even a
> pathlib.Path subclass (given duck typing). That's a clear benefit of a
> Path method, actually - the type of the return value is easy to
> specify - it's the type of self.
>
> Actually, in many ways, this is really a list (sequence) method -
> common_prefix - applied to the "parts" property of a Path. It's a
> shame there isn't a sequence utils module in the stdlib...
That's a good point. But do you really care that the result is a list (actually, isn't parts a tuple, not a list?), or just that it's some kind of iterable--or, even more generally, something you can make a Path object out of? Because there _is_ an iterable utils module in the stdlib, and I think the implementation is simpler if you think of it that way too:
def common_prefix(x: Iterable[X], y: Iterable[X]) -> Iterator[X]:
for a, b in zip(x, y):
if a != b: return
yield a
(Or, if you prefer, implement it as a chain of zip, takewhile, and map(itemgetter) then yield from the result.)
If you as a user want to turn that back into a tuple, you can, but normally you're just going to want to join them back up into a Path (or a type(p1)) without bothering with that.
> One thing my implementation doesn't (yet) handle is case sensitivity.
> The common prefix of WindowsPath('c:\\FOO\\bar') and
> WindowsPath('C:\\Foo\\BAR') should be WindowsPath('C:\\Foo').
Not 'c:\\FOO'? I'd expect the left one to win--especially if it's a method, so the left one is self.
> But not
> for PosixPath. (And again, when they are mixed, which is silly but
> possible, what behaviour should apply? "Work like self" is the obvious
> answer if we have a method).
Needless to say, an itertools (or "sequencetools") function that you call on parts does nothing to either help or hinder this problem. But it does seem to lend itself better to approaches where parts holds some new FooPathComponent type, or maybe a str on POSIX but a new CaseInsensitiveStr on Windows.
More information about the Python-ideas
mailing list