[Python-Dev] Path object design

Wed Nov 1 04:14:13 CET 2006

I just saw the Path object thread ("PEP 355 status", Sept-Oct), saying
that the first object-oriented proposal was rejected.  I'm in favor of
the "directory tuple" approach which wasn't mentioned in the thread.
This was proposed by Noal Raphael several months ago: a Path object
that's a sequence of components (a la os.path.split) rather than a
string.  The beauty of this approach is that slicing and joining are
expressed naturally using the [] and + operators, eliminating several
methods.

Introduction:  http://wiki.python.org/moin/AlternativePathClass
Feature discussion:  http://wiki.python.org/moin/AlternativePathDiscussion
Reference implementation:  http://wiki.python.org/moin/AlternativePathModule

(There's a link to the introduction at the end of PEP 355.)  Right now
I'm working on a test suite, then I want to add the features marked
"Mike" in the discussion -- in a way that people can compare the
feature alternatives in real code -- and write a PEP.  But it's a big
job for one person, and there are unresolved issues on the discussion
page, not to mention things brought up in the "PEP 355 status" thread.
 We had three people working on the discussion page but development
seems to have ground to a halt.

One thing is sure -- we urgently need something better than os.path.
It functions well but it makes hard-to-read and unpythonic code.  For
instance, I have an application that has to add its libraries to the
Python path, relative to the executable's location.

/toplevel
    app1/
        bin/
            main_progam.py
            utility1.py
            init_app.py
        lib/
            app_module.py
    shared/
        lib/
            shared_module.py

The solution I've found is an init_app module in every application
that sets up the paths.  Conceptually it needs "../lib" and
"../../shared/lib", but I want the absolute paths without hardcoding
them, in a platform-neutral way.  With os.path, "../lib" is:

    os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib")

YUK!  Compare to PEP 355:

    Path(__FILE__).parent.parent.join("lib")

Much easier to read and debug.  Under Noam's proposal it would be:

    Path(__FILE__)[:-2] + "lib"

I'd also like to see the methods more intelligent: don't raise an
error if an operation is already done (e.g., a directory exists or a
file is already removed).  There's no reason to clutter one's code
with extra if's when the methods can easily encapsulate this. This was
considered a too radical departure from os.path for some, but I have
in mind even more radical convenience methods which I'd put in a
third-party subclass if they're not accepted into the standard
library, the way 'datetime' has third-party subclasses.

In my application I started using Orendorff's path module, expecting
the standard path object would be close to it.  When PEP 355 started
getting more changes and the directory-based alternative took off, I
took path.py out and rewrote my code for os.path until an alternative
becomes more stable. Now it looks like it will be several months and
possibly several third-party packages until one makes it into the
standard library. This is unfortunate.  Not only does it mean ugly
code in applications, but it means packages can't accept or return
Path objects and expect them to be compatible with other packages.

The reasons PEP 355 was rejected also sound strange.  Nick Coghlan
wrote (Oct 1):

> Things the PEP 355 path object lumps together:
>   - string manipulation operations
>   - abstract path manipulation operations (work for non-existent filesystems)
>   - read-only traversal of a concrete filesystem (dir, stat, glob, etc)
>   - addition & removal of files/directories/links within a concrete filesystem

> Dumping all of these into a single class is certainly practical from a utility
> point of view, but it's about as far away from beautiful as you can get, which
> creates problems from a learnability point of view, and from a
> capability-based security point of view.

What about the convenience of the users and the beauty of users' code?
 That's what matters to me.  And I consider one class *easier* to
learn.  I'm tired of memorizing that 'split' is in os.path while
'remove' and 'stat' are in os.  This seems arbitrary: you're statting
a path, aren't you?  Also, if you have four classes (abstract path,
file, directory, symlink), *each* of those will have 3+
platform-specific versions.  Then if you want to make an enhancement
subclass you'll have to make 12 of them, one for each of the 3*4
combinations of superclasses.  Encapsulation can help with this, but
it strays from the two-line convenience for the user:

    from path import Path
    p = Path("ABC")      # Works the same for files/directories on any platform.

Nevertheless, I'm open to seeing a multi-class API, though hopefully
less verbose than Talin's preliminary one (Oct 26).  Is it necessary
to support path.parent(), pathobj.parent(), io.dir.listdir(), *and*
io.dir.Directory().  That's four different namespaces to memorize
which function/method is where, and if a function/method belongs to
multiple ones it'll be duplicated, and you'll have to remember that
some methods are duplicated and others aren't...  Plus, constructors
like io.dir.Directory() look too verbose.  io.Directory() might be
acceptable, with the functions as class methods.

I agree that supporting non-filesystem directories (zip files,
CSV/Subversion sandboxes, URLs) would be nice, but we already have a
big enough project without that.  What constraints should a Path
object keep in mind in order to be forward-compatible with this?

If anyone has design ideas/concerns about a new Path class(es), please
post them.  If anyone would like to work on a directory-based
spec/implementation, please email me.

-- 
Mike Orr <sluggoster at gmail.com>