[Python-ideas] PEP 428 - object-oriented filesystem paths

Sat Oct 6 19:08:21 CEST 2012

On Sat, 6 Oct 2012 12:14:40 -0400
Calvin Spealman <ironfroggy at gmail.com>
wrote:
> 
> It feels like this proposal is "make it object oriented, because
> object oriented is good" without any actual justification or obvious
> problem this solves. The API looks clunky and redundant, and does not
> appear to actually improve anything over the facilities in the os.path
> module.

Personally, I cringe everytime I have to type
`os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two
directories upwards of a given path. Compare, with, say:

>>> p = Path('/a/b/c/d')
>>> p.parent(2)
PosixPath('/a/b')

Really, I don't think os.path is the prettiest or most convenient
"battery" in the stdlib.

> This takes a lot of things we can already do with paths and
> files and remixes them into a not-so intuitive API for the sake of
> change, not for the sake of solving a real problem.

Ironing out difficulties such as platform-specific case-sensitivity
rules or the various path separators is a real problem that is not
solved by a os.path-like API, because you can't muck with str and give
it the required semantics for a filesystem path. So people end up
sprinkling their code with calls to os.path.normpath() and/or
os.path.normcase() in the hope that it will appease the Gods of
Portability (which will also lose casing information).

> Not inheriting from str means that we can't directly path these path
> objects to existing code that just expects a string, so we have a
> really hard boundary around the edges of this new API. It does not
> lend itself well to incrementally transitioning to it from existing
> code.

As discussed in the PEP, I consider inheriting from str to be a mistake
when your intent is to provide different semantics from str.

Why should indexing or iterating over a path produce individual
characters?
Why should Path.split() split over whitespace by default?
Why should "c:\\" be considered unequal to "C:\\" under Windows? 
Why should startswith() work character by character, rather than path
component by path component?

These are all standard str behaviours that are unhelpful when applied
to filesystem paths.

As for the transition, you just have to call str() on the path object.
Since str() also works on plain str objects (and is a no-op), it seems
rather painless to me.

(Of course, you are not forced to transition. The PEP doesn't call for
deprecation of os.path.)

> The stat operations and other file-facilities tacked on feel out of
> place, and limited. Why does it make sense to add these facilities to
> path and not other file operations? Why not give me a read method on
> paths? or maybe a copy?

There is always room to improve and complete the API without breaking
compatibility. To quote the PEP: “More operations could be provided,
for example some of the functionality of the shutil module”.

The focus of the PEP is not to enumerate every possible file operation,
but to propose the semantic and syntactic foundations (such as how to
join paths, how to divide them into their individual components, etc.).

> Putting lots of file facilities on a path
> object feels wrong because you can't extend it easily. This is one
> place that function(thing) works better than thing.function()

But you can still define a function() taking a Path as an argument, if
you need to.
Similarly, you can define a function() taking a datetime object if the
datetime object's API lacks some useful functionality for you.

Regards

Antoine.

-- 
Software development and contracting: http://pro.pitrou.net