[Python-Dev] Alternative path suggestion

Thu May 4 10:13:18 CEST 2006

On 5/2/06, Noam Raphael <noamraph at gmail.com> wrote:
> Here are my ideas. It's a copy of what I posted a few minutes ago in
> the wiki - you can view it at
> http://wiki.python.org/moin/AlternativePathClass (it looks better
> there).
>
> You can find the implementation at
> http://wiki.python.org/moin/AlternativePathModule?action=raw
> (By the way, is there some "code wiki" available? It can simply be a
> public svn repository. I think it will be useful for those things.)

Intriguing idea, Noam, and excellent thinking.  I'd say it's worth a
separate PEP.  It's too different to fit into PEP 355, and too big to
be summarized in the "Open Issues" section.  Of course, one PEP will
be rejected if the other is approved.

The main difficulty with this approach is it's so radical.  It would
require a serious champion to convince people it's as good as our
tried-and-true strings.

> == a tuple instead of a string ==
>
> The biggest conceptual change is that my path object is a subclass of
> ''tuple'', not a subclass of str. For example,
> {{{
> >>> tuple(path('a/b/c'))
> ('a', 'b', 'c')
> >>> tuple(path('/a/b/c'))
> (path.ROOT, 'a', 'b', 'c')
> }}}

How about  an .isabsolute attribute instead of prepending path.ROOT? 
I can see arguments both ways.  An attribute is easy to query and easy
for str() to use, but it wouldn't show up in a tuple-style repr().

> This means that path objects aren't the string representation of a
> path; they are a ''logical'' representation of a path. Remember why a
> filesystem path is called a path - because it's a way to get from one
> place on the filesystem to another. Paths can be relative, which means
> that they don't define from where to start the walk, and can be not
> relative, which means that they do. In the tuple representation,
> relative paths are simply tuples of strings, and not relative paths
> are tuples of strings with a first "root" element.
>
> The advantage of using a logical representation is that you can forget
> about the textual representation, which can be really complex. You
> don't have to call normpath when you're unsure about how a path looks,
> you don't have to search for seps and altseps, and... you don't need
> to remember a lot of names of functions or methods. To show that, take
> a look at those methods from the original path class and their
> equivalent in my path class:
>
> {{{
> p.normpath()  -> Isn't needed - done by the constructor
> p.basename()  -> p[-1]
> p.splitpath() -> (p[:-1], p[-1])
> p.splitunc()  -> (p[0], p[1:]) (if isinstance(p[0], path.UNCRoot))
> p.splitall()  -> Isn't needed
> p.parent      -> p[:-1]
> p.name        -> p[-1]
> p.drive       -> p[0] (if isinstance(p[0], path.Drive))
> p.uncshare    -> p[0] (if isinstance(p[0], path.UNCRoot))
>
> and of course:
> p.join(q) [or anything like it] -> p + q
> }}}

All that slicing is cool.

> The only drawback I can see in using a logical representation is that
> giving a path object to functions which expect a path string won't
> work. The immediate solution is to simply use str(p) instead of p. The
> long-term solution is to make all related functions accept a path
> object.

That's a big drawback.  PEP 355 can choose between string and
non-string, but this way is limited to non-string.  That raises the
minor issue of changing the open() functions etc in the standard
library, and the major issue of changing them in third-party
libraries.

> Having a logical representation of a path calls for a bit of term
> clearing-up. What's an absolute path? On POSIX, it's very simple: a
> path starting with a '/'. But what about Windows? Is "\temp\file" an
> absolute path? I claim that it isn't really. The reason is that if you
> change the current working directory, its meaning changes: It's now
> not "c:\temp\file", but "a:\temp\file". The same goes for
> "c:temp\file". So I decided on these two definitions:
>
>  * A ''relative path'' is a path without a root element, so it can be
> concatenated to other paths.
>  * An ''absolute path'' is a path whose meaning doesn't change when
> the current working directory changes.
>
> This means that paths starting with a drive letter alone
> (!UnrootedDrive instance, in my module) and paths starting with a
> backslash alone (the CURROOT object, in my module) are not relative
> and not absolute.

I guess that's plausable.  We'll need feedback from Windows users.

> I really think that it's a better way to handle paths. If you want an
> example, compare the current implementation of relpathto and my
> implementation.

In my enhanced Path class (I posted the docstring last summer), I made
a .relpathfrom() function because .relpathto() was so confusing.

> == Easier attributes for stat objects ==
>
> The current path objects includes:
>  * isdir, isfile, islink, and -
>  * atime, mtime, ctime, size.
> The first line does file mode checking, and the second simply gives
> attributes from the stat object.
>
> I suggest that these should be added to the stat_result object. isdir,
> isfile and islink are true if a specific bit in st_mode is set, and
> atime, mtime, ctime and size are simply other names for st_atime,
> st_mtime, st_ctime and st_size.
>
> It means that instead of using the atime, mtime etc. methods, you will
> write {{{ p.stat().atime }}}, {{{ p.stat().size }}}, etc.
>
> This is good, because:
>  * If you want to make only one system call, it's very easy to save
> the stat object and use it.
>  * If you have to deal with symbolic links, you can simply use {{{
> p.lstat().mtime }}}. Yes, symbolic links have a modification time. The
> alternative is to add three methods with ugly names (latime, lmtime,
> lctime) or to have an incomplete interface without a good reason.
>
> I think that isfile, isdir should be kept (along with lisfile,
> lisdir), since I think that doing what they do is quite common, and
> requires six lines:
> {{{
> try:
>     st = p.stat()
> except OSError:
>     return False
> else:
>     return st.isdir
> }}}
>
> I think that still, isdir, isfile and islink should be added to
> stat_result objects: They turned out pretty useful in writing some of
> the more complex path methods.

Not sure about this.  I see the point in not duplicating .foo() vs
.stat().foo.  .foo() exists in os.path to avoid the ugliness of
os.stat() in the middle of an expression.  I think the current
recommendation is to just do stats all the time because the overhead
is minimal and it's not worth getting out of sync.

The question is, does forcing people to use .stat() expose an
implementation detail that should be hidden, and does it smell of
Unixism?  Most people think a file *is* a regular file or a directory.
 The fact that this is encoded in the file's permission bits -- which
stat() examines -- is a quirk of Unix.

> == One Method for Finding Files ==
>
> (They're actually two, but with exactly the same interface). The
> original path object has these methods for finding files:
>
> {{{
> def listdir(self, pattern = None): ...
> def dirs(self, pattern = None): ...
> def files(self, pattern = None): ...
> def walk(self, pattern = None): ...
> def walkdirs(self, pattern = None): ...
> def walkfiles(self, pattern = None): ...
> def glob(self, pattern):
> }}}
>
> I suggest one method that replaces all those:
> {{{
> def glob(self, pattern='*', topdown=True, onlydirs=False, onlyfiles=False): ...
> }}}
>
> pattern is the good old glob pattern, with one additional extension:
> "**" matches any number of subdirectories, including 0. This means
> that '**' means "all the files in a directory", '**/a' means "all the
> files in a directory called a", and '**/a*/**/b*' means "all the files
> in a directory whose name starts with 'b' and the name of one of their
> parent directories starts with 'a'".

I like the separate methods, but OK.  I hope it doesn't *really* call
glob if the pattern is the default.

How about 'dirs' and 'files' instead of 'onlydirs' and 'onlyfiles'? 
Or one could, gasp, pass a constant or the 'find' command's
abbreviation ("d" directory, "f" file, "s" socket, "b" block
special...).

In my enhanced Path class, I used a boolean 'symlinks' arg meaning
"follow symlinks" (default True).  This allows the user to process
symbolic links seperately if he wishes, or to ignore them if he
doesn't care.  Separate .symlinks() and .walklinks() methods handle
the case that the user does want to treat them specially.  This seems
to be more intuitive and flexible than just your .symlinks() method
below.

Not sure I like "**/" over a 'recursive' argument because the syntax
is so novel and nonstandard.

You mean "ancestor" in the last paragraph, not "parent'.  Parent is
often read as immediate parent, and ancestor as any number of levels
up.

> == Reduce the Number of Methods ==
>
> I think that the number of methods should be reduced. The most obvious
> example are the copy functions. In the current proposal:
>
> {{{
> def copyfile(self, dst): ...
> def copymode(self, dst): ...
> def copystat(self, dst): ...
> def copy(self, dst): ...
> def copy2(self, dst): ...
> }}}
>
> In my proposal:
>
> {{{
> def copy(self, dst, copystat=False): ...
> }}}
>
> It's just that I think that copyfile, copymode and copystat aren't
> usually useful, and there's no reason not to unite copy and copy2.

Sounds good.

> = Other Changes =
>
> Here is a list of the smaller things I've changed in my proposal.
>
> The current normpath removes '..' with the name before them. I didn't
> do that, because it doesn't return an equivalent path if the path
> before the '..' is a symbolic link.

I was wondering what the fallout would be of normalizing "a/../b" and
"a/./b" and "a//b", but it sounds like you're thinking about it.

> I removed the methods associated with file extensions. I don't recall
> using them, and since they're purely textual and not OS-dependent, I
> think that you can always do p[-1].rsplit('.', 1).

No, .ext and .namebase are important!  I use them all the time; e.g.,
to force the extension to lowercase, to insert/update a suffix to the
basename and re-add the extension, etc.  I can see cutting the other
properties but not these.

.namebase is an obnoxious name though.  I wish we could come up with
something better.

> I removed unlink. It's an alias to remove, as far as I know.

It was added because .unlink() is cryptic to non-Unix users.  Although
I'd argue that .delete() is more universally understood than either.

> I removed expand. There's no need to use normpath, so it's equivalent
> to .expanduser().expandvars(), and I think that the explicit form is
> better.

Expand is useful though, so you don't forget one or the other.

> copytree - I removed it. In shutil it's documented as being mostly a
> demonstration, and I'm not sure if it's really useful.

Er, not sure I've used it, but it seems useful.  Why force people to
reinvent the wheel with their own recursive loops that they may get
wrong?

> symlink - Instead of a function like copy, with the destination as the
> second (actually, the only) argument, I wrote "writelink", which gets
> a string and creates a symbolic link with that value. The reason is
> that symbolic links can be any string, not necessarily a legal path.

Does that mean you have to build a relative link yourself if you're
going from one directory to another?

> p.ext          -> ''.join(p[-1].rsplit('.', 1)[1:])

People shouldn't have to do this.

> Unicode - I have no idea about unicode paths. My current
> implementation simply uses str. This should be changed, I guess.

No idea about that either.

You've got two issues here.  One is to go to a tuple base and replace
several properties with slicing.  The other is all your other proposed
changes.  Ideally the PEP would be written in a way that these other
changes can be propagated back and forth between the PEPs as consensus
builds.

--
Mike Orr <sluggoster at gmail.com>
(mso at oz.net address is semi-reliable)