[Python-ideas] os.path.commonprefix: Yes that old chestnut.
Andrew Barnert
abarnert at yahoo.com
Tue Mar 24 15:44:40 CET 2015
On Mar 24, 2015, at 6:55 AM, Andrew Barnert <abarnert at yahoo.com.dmarc.invalid> wrote:
>
>> On Mar 24, 2015, at 4:56 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>>
>>> On 23 March 2015 at 21:33, Gregory P. Smith <greg at krypto.org> wrote:
>>> +1 pathlib would be the appropriate place for the correctly behaving
>>> function to appear.
>>
>> OK, so here's a question. What actual use cases exist for a
>> common_prefix function? The reason I ask is that I'm looking at some
>> of the edge cases, and the obvious behaviour isn't particularly clear
>> to me.
>>
>> For example, common_prefix('a/b/file.c', 'a/b/file.c'). The common
>> prefix is obviously 'a/b/file.c' - but I can imagine people *actually*
>> wanting the common *directory* containing both files. But taken
>> literally, that's only possible if you check the filesystem, so it
>> would no longer be a PurePath operation.
>
> The traditional way to handle this is that the basename (the part after the last '/') is assumed to be a file (if you don't want that, include the trailing slash). POSIX even defines the technical term "path prefix" to mean everything up to the last slash, so something called a "common path prefix" sounds like it should be the common prefix of the path prefixes, right? Except that not command and function in POSIX works this way, requiring you to memorize or look up the man page to see what someone chose as "obvious" back in the 1970s....
Sorry, forgot to fill in the cite. See 3.2.69 at http://pubs.opengroup.org/stage7tc1/basedefs/V1_chap03.html for the 2008 definition of "path prefix".
> At any rate, we probably don't need to figure this out from first principles; I'm pretty sure some subset of {Java, Boost, Cocoa, .NET, JUCE, one overwhelming popular CPAN library, etc.} have already come up with an answer, and if most of them agree, we probably want to follow suit (even if it seems silly).
From a quick search, it looks like many other languages don't define a common prefix path method/function, but many do define a generic iterable common-prefix function (or a first-mismatch function and random-access iterables so you can easily build one trivially, as in C++).
This implies that the obvious solution in most languages will include the basename, not skip it. And it looks like http://rosettacode.org/wiki/Find_common_directory_path agrees with that.
From my other reply, looking over the functions used in some of the rosettacode examples, it looks like the generic iterable function, when it exists, and the language makes it feasible, often handles an arbitrary number of arguments, not just two. Which makes sense, now that I think about it. So:
def common_prefix(*iterables):
for first, *rest in zip(*iterables):
if any(first != part for part in rest):
return
yield first
And of course to wrap it up:
def common_path_prefix(*paths):
return paths[0].__class__(
*(common_prefix(*((path.__class__(p) for p in path.parts) for path in paths)))
Except I'll bet I got the parens wrong somewhere; it's probably clearer as:
def common_path_prefix(*paths):
parts = (map(path.__class__, path.parts) for path in paths)
prefix = common_prefix(*parts)
return paths[0].__class__(*prefix)
>> And what about common_prefix('foo/bar', '../here/foo')? Or
>> common_prefix('bar/baz', 'foo/../bar/baz')? Pathlib avoids collapsing
>> .. because the meaning could change in the face of symlinks. I believe
>> the same applies here. Maybe you need to call resolve() before doing
>> the common prefix operation (but that gives an absolute path).
>>
>> With the above limitations, would a common_prefix function actually
>> help typical use cases? In my experience, doing list operations on
>> pathobj.parts is often simple enough that I don't need specialised
>> functions like common_prefix...
>>
>> Getting the edge cases right is fiddly enough for common_prefix that a
>> specialised function is a reasonable idea, but only if the "obvious"
>> behaviour is clear. If there's a lot of conflicting possibilities,
>> maybe a recipe in the docs would be a better option.
>>
>> Paul
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
More information about the Python-ideas
mailing list