[Python-Dev] PEP 355 status

glyph at divmod.com glyph at divmod.com
Sun Oct 1 08:09:02 CEST 2006

On Sun, 01 Oct 2006 13:56:53 +1000, Nick Coghlan <ncoghlan at gmail.com> wrote:

>Things the PEP 355 path object lumps together:
>   - string manipulation operations
>   - abstract path manipulation operations (work for non-existent filesystems)
>   - read-only traversal of a concrete filesystem (dir, stat, glob, etc)
>   - addition & removal of files/directories/links within a concrete filesystem
>Dumping all of these into a single class is certainly practical from a utility
>point of view, but it's about as far away from beautiful as you can get, which
>creates problems from a learnability point of view, and from a
>capability-based security point of view. PEP 355 itself splits the methods up
>into 11 distinct categories when listing the interface.
>At the very least, I would want to split the interface into separate abstract
>and concrete interfaces. The abstract object wouldn't care whether or not the
>path actually existed on the current filesystem (and hence could be relied on
>to never raise IOError), whereas the concrete object would include the many
>operations that might need to touch the real IO device. (the PEP has already
>made a step in the right direction here by removing the methods that accessed
>a file's contents, leaving that job to the file object where it belongs).
>There's a case to be made for the abstract object inheriting from str or
>unicode for compatiblity with existing code,

I think that compatibility can be achieved by having a "pathname" string attribute or similar to convert to a string when appropriate.  It's not like datetime inherits from str to facilitate formatting or anything like that.

>but an alternative would be to
>enhance the standard library to better support the use of non-basestring
>objects to describe filesystem paths. A PEP should at least look into what
>would have to change at the Python API level and the C API level to go that
>route rather than the inheritance route.

In C, this is going to be really difficult.  Existing C APIs want to use C functions to deal with pathnames, and many libraries are not going to support arbitrary VFS I/O operations.  For some libraries, like GNOME or KDE, you'd have to use the appropriate VFS object for their platform.

>For the concrete interface, the behaviour is very dependent on whether the
>path refers to a file, directory or symlink on the current filesystem. For an
>OO filesystem interface, does it really make sense to leave them all lumped
>into the one class with a bunch of isdir() and islink() style methods? Or does
>it make more sense to have a method on the abstract object that will return
>the appropriate kind of filesystem info object? 

I don't think returning different types of objects makes sense.  This sort of typing is inherently prone to race conditions.  If you get a "DirectoryPath" object in Python, and then the underlying filesystem changes so that the name that used to be a directory is now a file (or a device, or UNIX socket, or whatever), how do you change the underlying type?

>If the latter, then how would
>you deal with the issue of state coherency (i.e. it was a file when you last
>touched it on the filesystem, but someone else has since changed it to a
>link)? (that last question actually lends strong support to the idea of a
>*single* concrete interface that dynamically responds to changes in the
>underlying filesystem).

In non-filesystem cases, for example the "zip path" case, there are inherent failure modes that you can't really do anything about (what if the zip file is removed while you're in the middle of manipulating it?) but there are actual applications which depend on the precise atomic semantics and error conditions associated with moving, renaming, and deleting directories and files, at least on POSIX systems.

The way Twisted does this is that FilePath objects explicitly cache the results of "stat" and then have an explicit "restat" method for resychronizing with the current state of the filesystem.  None of their methods for *manipulating* the filesystem look at this state, since it is almost guaranteed to be out of date :).

>Another key difference between the two is that the abstract objects would be
>hashable and serialisable, as their state is immutable and independent of the
>filesystem. For the concrete objects, the only immutable part of their state
>is the path name - the rest would reflect the state of the filesystem at the
>current point in time.

It doesn't really make sense to separate these to me; whenever you're serializing or hashing that information, the "mutable" parts should just be discarded.

More information about the Python-Dev mailing list