[Python-Dev] Path object design
talin at acm.org
Wed Nov 1 04:20:50 CET 2006
I'm right in the middle of typing up a largish post to go on the
Python-3000 mailing list about this issue. Maybe we should move it over
there, since its likely that any path reform will have to be targeted at
Mike Orr wrote:
> I just saw the Path object thread ("PEP 355 status", Sept-Oct), saying
> that the first object-oriented proposal was rejected. I'm in favor of
> the "directory tuple" approach which wasn't mentioned in the thread.
> This was proposed by Noal Raphael several months ago: a Path object
> that's a sequence of components (a la os.path.split) rather than a
> string. The beauty of this approach is that slicing and joining are
> expressed naturally using the  and + operators, eliminating several
> Introduction: http://wiki.python.org/moin/AlternativePathClass
> Feature discussion: http://wiki.python.org/moin/AlternativePathDiscussion
> Reference implementation: http://wiki.python.org/moin/AlternativePathModule
> (There's a link to the introduction at the end of PEP 355.) Right now
> I'm working on a test suite, then I want to add the features marked
> "Mike" in the discussion -- in a way that people can compare the
> feature alternatives in real code -- and write a PEP. But it's a big
> job for one person, and there are unresolved issues on the discussion
> page, not to mention things brought up in the "PEP 355 status" thread.
> We had three people working on the discussion page but development
> seems to have ground to a halt.
> One thing is sure -- we urgently need something better than os.path.
> It functions well but it makes hard-to-read and unpythonic code. For
> instance, I have an application that has to add its libraries to the
> Python path, relative to the executable's location.
> The solution I've found is an init_app module in every application
> that sets up the paths. Conceptually it needs "../lib" and
> "../../shared/lib", but I want the absolute paths without hardcoding
> them, in a platform-neutral way. With os.path, "../lib" is:
> os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib")
> YUK! Compare to PEP 355:
> Much easier to read and debug. Under Noam's proposal it would be:
> Path(__FILE__)[:-2] + "lib"
> I'd also like to see the methods more intelligent: don't raise an
> error if an operation is already done (e.g., a directory exists or a
> file is already removed). There's no reason to clutter one's code
> with extra if's when the methods can easily encapsulate this. This was
> considered a too radical departure from os.path for some, but I have
> in mind even more radical convenience methods which I'd put in a
> third-party subclass if they're not accepted into the standard
> library, the way 'datetime' has third-party subclasses.
> In my application I started using Orendorff's path module, expecting
> the standard path object would be close to it. When PEP 355 started
> getting more changes and the directory-based alternative took off, I
> took path.py out and rewrote my code for os.path until an alternative
> becomes more stable. Now it looks like it will be several months and
> possibly several third-party packages until one makes it into the
> standard library. This is unfortunate. Not only does it mean ugly
> code in applications, but it means packages can't accept or return
> Path objects and expect them to be compatible with other packages.
> The reasons PEP 355 was rejected also sound strange. Nick Coghlan
> wrote (Oct 1):
>> Things the PEP 355 path object lumps together:
>> - string manipulation operations
>> - abstract path manipulation operations (work for non-existent filesystems)
>> - read-only traversal of a concrete filesystem (dir, stat, glob, etc)
>> - addition & removal of files/directories/links within a concrete filesystem
>> Dumping all of these into a single class is certainly practical from a utility
>> point of view, but it's about as far away from beautiful as you can get, which
>> creates problems from a learnability point of view, and from a
>> capability-based security point of view.
> What about the convenience of the users and the beauty of users' code?
> That's what matters to me. And I consider one class *easier* to
> learn. I'm tired of memorizing that 'split' is in os.path while
> 'remove' and 'stat' are in os. This seems arbitrary: you're statting
> a path, aren't you? Also, if you have four classes (abstract path,
> file, directory, symlink), *each* of those will have 3+
> platform-specific versions. Then if you want to make an enhancement
> subclass you'll have to make 12 of them, one for each of the 3*4
> combinations of superclasses. Encapsulation can help with this, but
> it strays from the two-line convenience for the user:
> from path import Path
> p = Path("ABC") # Works the same for files/directories on any platform.
> Nevertheless, I'm open to seeing a multi-class API, though hopefully
> less verbose than Talin's preliminary one (Oct 26). Is it necessary
> to support path.parent(), pathobj.parent(), io.dir.listdir(), *and*
> io.dir.Directory(). That's four different namespaces to memorize
> which function/method is where, and if a function/method belongs to
> multiple ones it'll be duplicated, and you'll have to remember that
> some methods are duplicated and others aren't... Plus, constructors
> like io.dir.Directory() look too verbose. io.Directory() might be
> acceptable, with the functions as class methods.
> I agree that supporting non-filesystem directories (zip files,
> CSV/Subversion sandboxes, URLs) would be nice, but we already have a
> big enough project without that. What constraints should a Path
> object keep in mind in order to be forward-compatible with this?
> If anyone has design ideas/concerns about a new Path class(es), please
> post them. If anyone would like to work on a directory-based
> spec/implementation, please email me.
More information about the Python-Dev