[Python-ideas] os.path.commonprefix: Yes that old chestnut.

Andrew Barnert abarnert at yahoo.com
Tue Mar 24 15:07:30 CET 2015


On Mar 24, 2015, at 6:51 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> 
> On 24 March 2015 at 13:44, Andrew Barnert <abarnert at yahoo.com> wrote:
>>> Actually, in many ways, this is really a list (sequence) method -
>>> common_prefix - applied to the "parts" property of a Path. It's a
>>> shame there isn't a sequence utils module in the stdlib...
>> 
>> That's a good point. But do you really care that the result is a list (actually, isn't parts a tuple, not a list?), or just that it's some kind of iterable--or, even more generally, something you can make a Path object out of? Because there _is_ an iterable utils module in the stdlib, and I think the implementation is simpler if you think of it that way too:
>> 
>>    def common_prefix(x: Iterable[X], y: Iterable[X]) -> Iterator[X]:
>>        for a, b in zip(x, y):
>>            if a != b: return
>>            yield a
>> 
>> (Or, if you prefer, implement it as a chain of zip, takewhile, and map(itemgetter) then yield from the result.)
>> 
>> If you as a user want to turn that back into a tuple, you can, but normally you're just going to want to join them back up into a Path (or a type(p1)) without bothering with that.
> 
> I was thinking that there might be a reason it wouldn't work for
> arbitrary iterators, so you'd need at least a Sequence. But you're
> right, an itertool is sufficient. Although given the itertools focus
> on building blocks, it may end up being simply a recipe.
> 
>>> One thing my implementation doesn't (yet) handle is case sensitivity.
>>> The common prefix of WindowsPath('c:\\FOO\\bar') and
>>> WindowsPath('C:\\Foo\\BAR') should be WindowsPath('C:\\Foo').
>> 
>> Not 'c:\\FOO'? I'd expect the left one to win--especially if it's a method, so the left one is self.
> 
> Technically, WindowsPath('C:\\FOO') and WindowsPath('C:\\Foo') are the
> same, so I stand by what I said :-) But yeah, the natural
> implementation would give you the relevant part of self.
> 
>>> But not
>>> for PosixPath. (And again, when they are mixed, which is silly but
>>> possible, what behaviour should apply? "Work like self" is the obvious
>>> answer if we have a method).
>> 
>> Needless to say, an itertools (or "sequencetools") function that you call on parts does nothing to either help or hinder this problem. But it does seem to lend itself better to approaches where parts holds some new FooPathComponent type, or maybe a str on POSIX but a new CaseInsensitiveStr on Windows.
> 
> (p.__class__(pp) for pp in p.parts)

Sure, but then your whole expression looks something like:

    p1.__class__(*more_itertools.common_prefix(
        (p1.__class__(pp) for pp in p1.parts),
        (p2.__class__(pp) for pp in p2.parts)))

Which doesn't read quite as nicely as "just call an itertools function on the parts and construct a Path from them" sounds like it should.

Which implies that you'd probably want at least a recipe in the pathlib docs that referenced the recipe in the itertools docs or something.

And that many people who aren't on Windows just wouldn't bother and would write something non-portable until they got a complaint from a Windows user and found it worth investigating...

(While we're at it: most POSIX OS's can handle both case-sensitive and case-insensitive filesystems, and at least some OS X functions take that into account, although that may not be true at the BSD level, only at the POSIX level. For that matter, doesn't the HFS+ filesystem also consider two paths equal if they have the same NFKD, even if they have different code points? But I guess if I'm remembering right, this would be no more or less broken than any other use of PosixPath on Mac, so it's not worth worrying about here, right?)

> (See the thread about parts containing strings - technically using
> Path objects is dodgy, but as long as you don't leak the working
> values out of your function it's perfectly adequate).

Yeah, I agree that it's safe here even though it isn't safe in general.


More information about the Python-ideas mailing list