[Python-ideas] os.path.commonprefix: Yes that old chestnut.
Paul Moore
p.f.moore at gmail.com
Tue Mar 24 14:51:30 CET 2015
On 24 March 2015 at 13:44, Andrew Barnert <abarnert at yahoo.com> wrote:
>> Actually, in many ways, this is really a list (sequence) method -
>> common_prefix - applied to the "parts" property of a Path. It's a
>> shame there isn't a sequence utils module in the stdlib...
>
> That's a good point. But do you really care that the result is a list (actually, isn't parts a tuple, not a list?), or just that it's some kind of iterable--or, even more generally, something you can make a Path object out of? Because there _is_ an iterable utils module in the stdlib, and I think the implementation is simpler if you think of it that way too:
>
> def common_prefix(x: Iterable[X], y: Iterable[X]) -> Iterator[X]:
> for a, b in zip(x, y):
> if a != b: return
> yield a
>
> (Or, if you prefer, implement it as a chain of zip, takewhile, and map(itemgetter) then yield from the result.)
>
> If you as a user want to turn that back into a tuple, you can, but normally you're just going to want to join them back up into a Path (or a type(p1)) without bothering with that.
I was thinking that there might be a reason it wouldn't work for
arbitrary iterators, so you'd need at least a Sequence. But you're
right, an itertool is sufficient. Although given the itertools focus
on building blocks, it may end up being simply a recipe.
>> One thing my implementation doesn't (yet) handle is case sensitivity.
>> The common prefix of WindowsPath('c:\\FOO\\bar') and
>> WindowsPath('C:\\Foo\\BAR') should be WindowsPath('C:\\Foo').
>
> Not 'c:\\FOO'? I'd expect the left one to win--especially if it's a method, so the left one is self.
Technically, WindowsPath('C:\\FOO') and WindowsPath('C:\\Foo') are the
same, so I stand by what I said :-) But yeah, the natural
implementation would give you the relevant part of self.
>> But not
>> for PosixPath. (And again, when they are mixed, which is silly but
>> possible, what behaviour should apply? "Work like self" is the obvious
>> answer if we have a method).
>
> Needless to say, an itertools (or "sequencetools") function that you call on parts does nothing to either help or hinder this problem. But it does seem to lend itself better to approaches where parts holds some new FooPathComponent type, or maybe a str on POSIX but a new CaseInsensitiveStr on Windows.
(p.__class__(pp) for pp in p.parts)
(See the thread about parts containing strings - technically using
Path objects is dodgy, but as long as you don't leak the working
values out of your function it's perfectly adequate).
Paul
More information about the Python-ideas
mailing list