The last time this was discussed six months ago it seemed like most of python-dev fancied Jason Orendorff's path module. But Guido wanted a PEP and noone created one. So I decided to claim the fame and write one since I also love the path module. :) Much of it is copy-pasted from Peter Astrand's PEP 324, many thanks to him. ################################################# PEP: XXX Title: path - Object oriented handling of filesystem paths Version: $Revision: XXXX $ Last-Modified: $Date: 2006-01-24 19:24:59 +0100 (Sat, 29 Jan 2005) $ Author: Björn Lindqvist <bjourne@gmail.com> Status: Dummo Type: Standards Track (library) Created: 24-Jan-2006 Content-Type: text/plain Python-Version: 2.4 Abstract This PEP describes a new class for handling paths in an object oriented fashion. Motivation Dealing with filesystem paths is a common task in any programming language, and very common in a high-level language like Python. Good support for this task is needed, because: - Almost every program uses paths to access files. It makes sense that a task, that is so often performed, should be as intuitive and as easy to perform as possible. - It makes Python an even better replacement language for over-complicated shell scripts. Currently, Python has a large number of different functions scattered over half a dozen modules for handling paths. This makes it hard for newbies and experienced developers to to choose the right method. The Path class provides the following enhancements over the current common practice: - One "unified" object provides all functionality from previous functions. - Subclassability - the Path object can be extended to support paths other than filesystem paths. The programmer does not need to learn a new API, but can reuse his or her knowledge of Path to deal with the extended class. - With all related functionality in one place, the right approach is easier to learn as one does not have to hunt through many different modules for the right functions. - Python is an object oriented language. Just like files, datetimes and sockets are objects so are paths, they are not merely strings to be passed to functions. Path objects are inherently a pythonic idea. - Path takes advantage of properties. Properties make for more readable code. if imgpath.ext == 'jpg': jpegdecode(imgpath) Is better than: if os.path.splitexit(imgpath)[1] == 'jpg': jpegdecode(imgpath) Rationale The following points summarizes the design: - Path extends from string, therefore all code which expects string pathnames need not be modified and no existing code will break. - A Path object can be created either by using the classmethod Path.cwd, by instantiating the class with a string representing a path or by using the default constructor which is equivalent with Path("."). - The factory functions in popen2 have been removed, because I consider the class constructor equally easy to work with. - Path provides for common pathname manipulation, pattern expansion, pattern matching and other high-level file operations including copying. Basically everything path related except for manipulation of file contents which file objects are better suited for. - Platform incompatibilites are dealt with by not instantiating system specific methods. Specification This class defines the following public methods: # Special Python methods. def __new__(cls, init = os.curdir): ... def __repr__(self): ... def __add__(self, more): ... def __radd__(self, other): ... def __div__(self, rel): ... def __truediv__(self, rel): ... # Alternative constructor. def cwd(cls): ... # Operations on path strings. def abspath(sef): ... def normcase(self): ... def normpath(self): ... def realpath(self): ... def expanduser(self): ... def expandvars(self): ... def dirname(self): ... def basename(self): ... def expand(self): ... def splitpath(self): ... def splitdrive(self): ... def splitext(self): ... def stripext(self): ... def splitunc(self): ... [1] def joinpath(self, *args): ... def splitall(self): ... def relpath(self): ... def relpathto(self, dest): ... # Properties about the path. parent, name, namebase, ext, drive, uncshare[1] # Operations that return lists of paths. def listdir(self, pattern = None): ... def dirs(self, pattern = None): ... def files(self, pattern = None): ... def walk(self, pattern = None): ... def walkdirs(self, pattern = None): ... def walkfiles(self, pattern = None): ... def match(self, pattern): def matchcase(self, pattern): def glob(self, pattern): # Methods for retrieving information about the filesystem # path. def exists(self): ... def isabs(self): ... def isdir(self): ... def isfile(self): ... def islink(self): ... def ismount(self): ... def samefile(self, other): ... [1] def getatime(self): ... def getmtime(self): ... def getctime(self): ... def getsize(self): ... def access(self, mode): ... [1] def stat(self): ... def lstat(self): ... def statvfs(self): ... [1] def pathconf(self, name): ... [1] def utime(self, times): ... def chmod(self, mode): ... def chown(self, uid, gid): ... [1] def rename(self, new): ... def renames(self, new): # Filesystem properties for path. atime, getmtime, getctime, size # Methods for manipulating information about the filesystem # path. def utime(self, times): ... def chmod(self, mode): ... def chown(self, uid, gid): ... [1] def rename(self, new): ... def renames(self, new): ... # Create/delete operations on directories def mkdir(self, mode = 0777): ... def makedirs(self, mode = 0777): ... def rmdir(self): ... def removedirs(self): ... # Modifying operations on files def touch(self): ... def remove(self): ... def unlink(self): ... # Modifying operations on links def link(self, newpath): ... def symlink(self, newlink): ... def readlink(self): ... def readlinkabs(self): ... # High-level functions from shutil def copyfile(self, dst): ... def copymode(self, dst): ... def copystat(self, dst): ... def copy(self, dst): ... def copy2(self, dst): ... def copytree(self, dst, symlinks = True): ... def move(self, dst): ... def rmtree(self, ignore_errors = False, onerror = None): ... # Special stuff from os def chroot(self): ... [1] def startfile(self): ... [1] [1] - Method is not availible on all platforms. Replacing older functions with the Path class In this section, "a ==> b" means that b can be used as a replacement for a. In the following examples, we assume that the Path class is imported with "from path import Path". Replacing os.path.join ---------------------- os.path.join(os.getcwd(), "foobar") ==> Path.cwd() / "foobar" Replacing os.path.splitext -------------------------- os.path.splitext("Python2.4.tar.gz")[1] ==> Path("Python2.4.tar.gz").ext Replacing glob.glob ------------------- glob.glob("/lib/*.so") ==> Path("/lib").glob("*.so") Deprecations Introducing this module to the standard library introduces the need to deprecate a number of existing modules and functions. The table below explains which existing functionality that must be deprecated. PATH METHOD DEPRECATES FUNCTION normcase() os.path.normcase() normpath() os.path.normpath() realpath() os.path.realpath() expanduser() os.path.expanduser() expandvars() os.path.expandvars() dirname() os.path.dirname() basename() os.path.basename() splitpath() os.path.split() splitdrive() os.path.splitdrive() splitext() os.path.splitext() splitunc() os.path.splitunc() joinpath() os.path.join() listdir() os.listdir() [fnmatch.filter()] match() fnmatch.fnmatch() matchcase() fnmatch.fnmatchcase() glob() glob.glob() exists() os.path.exists() isabs() os.path.isabs() isdir() os.path.isdir() isfile() os.path.isfile() islink() os.path.islink() ismount() os.path.ismount() samefile() os.path.samefile() getatime() os.path.getatime() getmtime() os.path.getmtime() getsize() os.path.getsize() cwd() os.getcwd() access() os.access() stat() os.stat() lstat() os.lstat() statvfs() os.statvfs() pathconf() os.pathconf() utime() os.utime() chmod() os.chmod() chown() os.chown() rename() os.rename() renames() os.renames() mkdir() os.mkdir() makedirs() os.makedirs() rmdir() os.rmdir() removedirs() os.removedirs() remove() os.remove() unlink() os.unlink() link() os.link() symlink() os.symlink() readlink() os.readlink() chroot() os.chroot() startfile() os.startfile() copyfile() shutil.copyfile() copymode() shutil.copymode() copystat() shutil.copystat() copy() shutil.copy() copy2() shutil.copy2() copytree() shutil.copytree() move() shutil.move() rmtree() shutil.rmtree() The Path class deprecates the whole of os.path, shutil, fnmatch and glob. A big chunk of os is also deprecated. Open Issues Some functionality of Jason Orendorff's path module have been omitted: * Function for opening a path - better handled by the builtin open(). * Functions for reading and writing a whole file - better handled by file objects read() and write() methods. * A chdir() function may be a worthy inclusion. * A deprecation schedule needs to be setup. How much functionality should Path implement? How much of existing functionality should it deprecate and when? * Where should the class be placed and what should it be called? The functions and modules that this new module is trying to replace (os.path, shutil, fnmatch, glob and parts of os are expected to be available in future Python versions for a long time, to preserve backwards compatibility. Reference Implementation Currently, the Path class is implemented as a thin wrapper around the standard library modules sys, os, fnmatch, glob and shutil. The intention of this PEP is to move functionality from the aforementioned modules to Path while they are being deprecated. For more detail, and diverging implementations see: * http://www.jorendorff.com/articles/python/path/path.py * http://svn.python.org/projects/sandbox/trunk/path/path.py * http://cafepy.com/quixote_extras/rex/path/enhpath.py Examples In this section, "a ==> b" means that b can be used as a replacement for a. 1. Make all python files in the a directory executable: DIR = '/usr/home/guido/bin' for f in os.listdir(DIR): if f.endswith('.py'): path = os.path.join(DIR, f) os.chmod(path, 0755) ==> for f in Path('/usr/home/guido/bin'): f.chmod(0755) 2. Delete emacs backup files: def delete_backups(arg, dirname, names): for name in names: if name.endswith('~'): os.remove(os.path.join(dirname, name)) ==> d = Path(os.environ['HOME']) for f in d.walkfiles('*~'): f.remove() 3. Finding the relative path to a file: b = Path('/users/peter/') a = Path('/users/peter/synergy/tiki.txt') a.relpathto(b) 4. Splitting a path into directory and filename: os.path.split("/path/to/foo/bar.txt") ==> Path("/path/to/foo/bar.txt").splitpath() 5. List all Python scripts in the current directory tree: list(Path().walkfiles("*.py")) 6. Create directory paths: os.path.join("foo", "bar", "baz") ==> Path("foo") / "bar" / "baz" Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: -- mvh Björn
BJörn Lindqvist <bjourne@gmail.com> wrote:
1. Make all python files in the a directory executable:
[...]
Iterating over a path string to read the contents of the directory possibly pointed to by that string seems like "magic implicit" behaviour. Perhaps making it a method explicitly returning an iterator would by more Pythonic? for f in Path(...).readDir():
4. Splitting a path into directory and filename:
[...]
Path("/path/to/foo/bar.txt").splitpath()
Good. But the opposite isn't done similarly:
6. Create directory paths:
[...]
Path("foo") / "bar" / "baz"
Using "/" as "path concatenation operator" seems like un-Pythonic magic as well (while "+" would be an improvement, it's still not a large one). I would think Path('foo').appendparts('bar', 'baz') or similar would be more readable and obvious. Charles -- ----------------------------------------------------------------------- Charles Cazabon <python@discworld.dyndns.org> GPL'ed software available at: http://pyropus.ca/software/ -----------------------------------------------------------------------
Charles Cazabon wrote:
I believe .listdir() exists already as a method alternative. I'm -0 on iteration as listdir. Doing iteration like strings (over the characters) would be evil.
.joinpath() does this. Though .join() does something else entirely that it inherits from strings, something evil to do with a path, and I think that method should raise NotImplementedError. + should not be overridden, because strings define that. Some other str methods are harmless but pointless: center, expandtabs, ljust, zfill; title, capitalize, and istitle are iffy. Also, are there any particular cases where string methods on a path produce an actual str, not another Path object? -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
Thanks for doing this. I'm not sure anyone that matters here is actually keen on path, but I guess we'll see. A few comments: On 1/24/06, BJörn Lindqvist <bjourne@gmail.com> wrote:
Actually, I would prefer a Path that *didn't* subclass string, and a new "%p" format-thingy in PyArg_ParseTuple(). %p would expect either a Path object or a string. Stdlib C functions that take paths would be changed from using %s or %u to %p. This would pretty much eliminate the need for path objects to act like strings (except where __cmp__, __hash__, and common sense dictate). The only reason I didn't do this in path.py is that I don't have the required write access to the Python source tree. ;) Subclassing str/unicode seemed like the next best thing.
Aside: I added this to support a few people who liked the idea of "openable objects", meaning anything that has .open(), analogous to "writeable objects" being anything with .write(). I don't use it personally. Examples 1 and 2 have errors. In example 1, the "after" code should be: d = path('/usr/home/guido/bin') for f in d.files('*.py'): f.chmod(0755) In example 2, the "before" code is missing a line -- the call to os.path.walk(). (Actually it should probably use os.walk(), which looks much nicer.) I suspect you'll be asked to change the PEP to remove __div__ for starters, in which case I propose using the Path constructor as the replacement for os.path.join(). In that case, Path.joinpath can be dropped. -j
Jason Orendorff wrote:
Losing .open() would make it much harder for anyone wanting to write, say, a URI library that implements the Path API.
I'm -1 on this too. This means people will be hardcoding the specific class they expect, so you can't pass in other classes. E.g., this will fail: def read_config(home_dir): f = open(Path(home_dir, '.config_file')) c = f.read() f.close() return c read_config(URI('http://localhost/foo')) I'm personally +1 on /. I think people who think it is confusing are giving a knee-jerk reaction. It's very easy to tell the difference between file-related code and math-related code, and / is currently only used for math. In contrast, + is used for concatenation and addition, and these are far more ambiguous from context -- but still it doesn't cause that many problems. But barring /, .joinpath() should remain. Too bad there isn't a shorter name, except .join() which is taken. I would also, though, note that there are some security issues. When you use open() or other path-related functions, you *know* you are doing filesystem operations. You can't be getting some weird object that does who-knows-what. For the most part if you can get an arbitrary object into the system then the system is just broken, but I can imagine cases where this encourages people to do bad things. I only bring this up because it reminds me of PHP allowed over-the-net includes, which always seemed like a bad idea. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
Ian Bicking wrote:
It occurs to me that it might be hopeless to expect substitution to work generally (at least without a specific thought on the matter) because I expect this form will be typical: def read_config(path): # convert string input to a path (has no effect on Path objects): path = Path(path) content = path.text() Since people will be passing strings in to file-related functions for the forseeable future, so people will coerce that input to paths explicitly whenever they accept a path from a public function. Now, if there were a way to make sure that "Path(x) is x" is true when x is already a Path, and maybe a way to coerce strings to a Path without coercing Path-like objects into Path objects, that would help resolve the problem. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
John J Lee wrote:
My example shows this more clearly I think: def read_config(path): text = path.open().read() ... do something ... If I implement a URI object with an .open() method, then I can use it with this function, even though read_config() was written with file paths in mind. But without it that won't work: def read_config(path): text = open(path).read() ... -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
[Ian Bicking]
Losing .open() would make it much harder for anyone wanting to write, say, a URI library that implements the Path API.
[John]
Why? Could you expand a bit?
What's wrong with urlopen(filesystem_path_instance) ?
[Ian]
I should have expected that answer, but couldn't believe that you think it's a good idea to implement that obese filesystem path API for URLs ;-) Shouldn't we instead have: a) .open()-able objects blessed in the stdlib & stdlib docs, as a separate interface from the path interface (I guess that would be an argument in favour of path implementing that one-method interface, as long as it's not tied too tightly to the fat path interface) b) a path object with a thinner interface (I know you've already expressed that preference yourself...)? John
Good stuff. Some suggestions:
def joinpath(self, *args): ...
I suggest append() or extend() as join*() sort of suggest join() as provided by strings, which does something quite different
def splitall(self): ...
and this may renamed split(), as it is quite similar to split() as provided by strings
# Properties about the path. parent, name, namebase, ext, drive, uncshare[1]
so we can drop basename(), dirname(), splitdrive(), and splitext()
def dirs(self, pattern = None): ... def files(self, pattern = None): ...
can we add others()? (sockets, pipes, block and character devices) --eric
On Tue, 2006-01-24 at 21:22 +0100, BJörn Lindqvist wrote: [...]
[...etc...] If we wanted to take PEP 8 seriously, those method names should be changed to words_separated_by_underscores. And BTW, what does splitunc do? It really should have a more descriptive name. Regards. -- Gustavo J. A. M. Carneiro <gjc@inescporto.pt> <gustavo@users.sourceforge.net> The universe is always one step beyond logic
[Gustavo J. A. M. Carneiro wrote]
And BTW, what does splitunc do?
http://en.wikipedia.org/wiki/Path_%28computing%29#Universal_Naming_Conventio...
It really should have a more descriptive name.
No more that should "urllib" or "splitext". Trent -- Trent Mick TrentM@ActiveState.com
Gustavo J. A. M. Carneiro wrote:.
There's a (unspecified?) convention that many standard/core objects or objects in the standard library use squishedwords for methods. has_key is an anomoly, not the norm. Also, many of these are direct translations of methods from os.path, and so the names offer familiarity. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
BJörn Lindqvist wrote:
* Functions for reading and writing a whole file - better handled by file objects read() and write() methods.
I would be disappointed to see this left out, because I get really tired of this little dance: f = open(filename) c = f.read() f.close() return c Then you can put a try:finally: in there too for extra correctness. Anyway, I write this code all the time, and it's really tiresome. open(filename).read() also works, but relies on Python's reference counting to be really reliable; maybe I shouldn't worry about that and use just that form. But that bothers me too. The same occurs during writing. The original Path object has a bytes and text method (I think), which nicely distinguishes between the two cases. This helps suggest better and more explicit handling of unicode, something that Python should work harder at making clear. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
Ian Bicking wrote:
Python 2.5 (well, once someone finds time to update mwh's patch): with open(filename) as f: return f.read() Behaviour guaranteed by language definition ;) Cheers, Nick. P.S. I too would really like to see this happen for 2.5. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
On Tue, Jan 24, 2006 at 09:22:01PM +0100, BJ?rn Lindqvist wrote:
Path("foo") / "bar" / "baz"
I really love this! But I am afraid it's too much a Unixism. (-: Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
It would be great (and save a lot of rehashing) if you could go over the python-dev discussion and add the relevant parts (for example, whether to include the __div__ hack) to the PEP: <http://mail.python.org/pipermail/python-dev/2005-June/054439.html> =Tony.Meyer
[Tony Meyer]
In particular the points about Path being able to be a drop-in replacement for str/unicode are useful ones, and explain the use of joinpath() etc. It is really useful that I can use a Path anywhere I might have used an str and not have to worry about the conversions. -- Michael Hoffman <hoffman@ebi.ac.uk> European Bioinformatics Institute
There's kind of a lot of methods in here, which is a little bothersome. It also points towards the motivation for the class -- too many options in too many places in the stdlib. But throwing them *all* in one class consolidates but doesn't simplify, especially with duplicate functionality. I'm not strongly advocating any of the methods below be removed, but at least it seems like it should be discussed. By not removing any it is easier to translate the os.path (and other) forms. I imagine it will be a long time before those forms go away, though, so I don't know how useful it is to plan for a speedy and seamless transition. BJörn Lindqvist wrote:
This is equivalent to p.__class__(string.Template(p).safe_substitute(os.environ)). Obviously that form is a lot longer, but maybe usefully more explicit. Well, it is a *lot* longer. But if string.Template's functionality becomes a method on str (is that the plan?) then this won't be so bad. Also if string.Template returns an object of the same class as is passed in. Then maybe it'd just be p.safe_substitute(os.environ), which isn't bad at all. Maybe if this used Windows conventions on that platform -- of %VAR% -- it would seem more useful. Though I think $VAR should still work on both platforms regardless (just like / does).
def dirname(self): ... def basename(self): ...
These are duplicated as properties. basename and namebase are confusing alternatives to each other.
def expand(self): ...
I see this is a combination of normpath, expanduser, and expandvars. Useful, certainly. But there's also a lot of forms, and no one applies these operations consistently it seems.
This is another new method, equivalent to .splitext()[0]. I'm not sure it's that important.
And there's just so many splitting functions. Could these somehow be combined? Maybe returning a tuple/struct? Or maybe just relying on properties.
def relpath(self): ... def relpathto(self, dest): ...
These don't feel compellingly different according to the name. I find the cwd fragile too, so maybe the first form offends me from that perspective too. Just the explicit form feels sufficient to me, and unambiguous as both a reader and writer of code.
# Properties about the path. parent, name, namebase, ext, drive, uncshare[1]
Actually, I see namebase is actually the name without an extension. It seems ambiguous to me just from the names, and I'd rather both weren't there. Though ext somehow seems useful and unambiguous in a way namebase isn't. Not sure why. It's unclear which of these should be Paths. Of course parent should. None of the others? When methods return paths and when they return strings is an important part of the spec.
Notably these aren't like os.path.walk, I assume. Which is fine by me.
def match(self, pattern): def matchcase(self, pattern):
I don't see these methods in the path class, and I'm not sure what they'd do.
The stat and get* functions overlap too. I.e., p.getmtime() and p.stat().st_mtime are the same. Too bad about the st_* names on stat objects, otherwise I don't see any problem with using that directly. It still seems clearer.
def pathconf(self, name): ... [1]
I can't figure out what this does, even from the docs. Some of these seem obscure enough they could be left in os.
Mmm... then these show up yet again.
Dups in the spec.
Like pathconf, maybe these don't need to be moved into the module, and can be left in os. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
On 1/24/06, Ian Bicking <ianb@colorstudy.com> wrote:
I agree. Let me explain why there's so much cruft in path.py. The design is heavily skewed toward people already very familiar with the existing stdlib equivalents, because that is the market for third-party modules. I think my users want to type p.methodname() for whatever method name they already know, and have it *just work*. A sloppy API you've already learned is easier to pick up than a clean API you've never seen before. Ergo, cruft. A stdlib Path should have different design goals. It should have less redundancy, fewer methods overall, and PEP-8-compliant names. -j
My comments on the issues. It was easier this way than trying to reply on every message individually. Inheritance from string (Jason) This issue has been brought up before when people were discussing the path module. I think the consensus is that, while the inheritance isn't pure, practicality beats purity. It would require to big changes to Python and would break to much existing code to not extend string. I'll add this to Resolved Issues if nobody minds. * http://mail.python.org/pipermail/python-dev/2001-August/016663.html * http://mail.python.org/pipermail/python-dev/2001-August/016665.html * http://mail.python.org/pipermail/python-list/2005-July/291091.html * http://mail.python.org/pipermail/python-list/2005-July/291152.html Remove __div__ (Ian, Jason, Michael, Oleg) This is one of those where everyone (me too) says "I don't care either way." If that is so, then I see no reason to change it unless someone can show a scenario in which it hurts readability. Plus, a few people have said that they like the shortcut. * http://mail.python.org/pipermail/python-list/2005-July/292251.html * http://mail.python.org/pipermail/python-dev/2005-June/054496.html * http://mail.python.org/pipermail/python-list/2005-July/291628.html * http://mail.python.org/pipermail/python-list/2005-July/291621.html Remove redundant methods (Eric, Ian, Jason, Ronald, Toby) I think it is a good idea to take out some of Path's methods. Especially the ones that exist as both methods and properties. I have updated the pep and dumped basename(), dirname(), splitdrive() and splitext(). I think that glob() should also go away because I can't of the top of my head think of a scenario where it would be suitable over listdir(), dirs() or files(). Plus, for contrived examples; like glob.glob("/*bin/*foo*") the Path way doesn't look so good: Path("/").glob("*bin/*foo*"). Renaming methods because of PEP 8 (Gustavo, Ian, Jason) I'm personally not keen on that. I like most of the names as they are. abspath(), joinpath(), realpath() and splitall() looks so much better than abs_path(), join_path(), real_path() and split_all() in my eyes. If someone like the underscores I'll add it to Open Issues. Removing open() and methods that manipulate the whole file at once (Ian, Jason) I think this is settled with the addition of the with statement? My idea when scrubbing these methods was that it would make it easier to get the PEP accepted. However, even with with, these methods save some typing. * http://mail.python.org/pipermail/python-dev/2005-June/054439.html * http://mail.python.org/pipermail/python-list/2005-July/291435.html ?time properties and get?time() methods Clearly, Path has either the properties or the methods, not both at once. Yet another "I don't care either way." * http://mail.python.org/pipermail/python-dev/2005-June/054439.html * http://mail.python.org/pipermail/python-list/2005-July/291424.html * http://mail.python.org/pipermail/python-list/2005-July/291460.html I have also the corrected the copy-paste errors I inadvertedly introduced. Path should *not* have an __iter__. :) * match() and matchcase() wraps the fnmatch.fnmatch() and fnmatch.fnmatchcase() functions. I believe that the renaming is uncontroversial and that the introduction of matchcase() makes it so the whole fnmatch module can be deprecated. I have created an implementation of Path that corresponds to the specification in the PEP (which I hope will appear on www.python.org/peps soon). It is based on Reinhold's (Georg Brandl) implementation from pre-PEP threads in c.l.p last summer. But I have no place to upload it. I would also like if some people wants to co-author this PEP with me - it's really neither my PEP nor my module. -- mvh Björn
On Wed, Jan 25, 2006 at 09:37:04PM +0100, BJ?rn Lindqvist wrote:
Remove __div__ (Ian, Jason, Michael, Oleg)
I didn't say "remove". Exactly opposite - I am enamoured by the beauty of the syntax! (-: Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
BJörn Lindqvist wrote:
The renaming is fine with me. I generally use the fnmatch module for wildcard matching, not necessarily against path names. Path.match doesn't replace that functionality. Though fnmatch.translate isn't even publically documented, which is the function I actually tend to use. Though it seems a little confusing to me that glob treats separators specially, and that's not implemented at the fnmatch level. So Path('/a/b/d/c').match('a/*/d') is true, but Path('/').walk('a/*/d') won't return Path('/a/b/c/d'). I think .match() should be fixed. But I don't think fnmatch should be changed. I'm actually finding myself a little confused by the glob arguments (if the glob contains '/'), now that I really think about them. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
Well, if you include the much larger discussion on python-list, people (including me) have said that removing __div__ is a good idea. If it's included in the PEP, please at least include a justification and cover the problems with it. The vast majority of people (at least at the time) were either +0 or -0, not +1. +0's are not justification for including something. Against it: * Zen: Beautiful is better than ugly. Explicit is better than implicit. Readability counts. There should be one-- and preferably only one --obvious way to do it. * Not every platform that Python supports has '/' as the path separator. Windows, a pretty major one, has '\'. I have no idea what various portable devices use, but there's a reasonable chance it's not '/'. * It's being used to mean "join", which is the exact opposite of /'s other meaning ("divide"). * Python's not Perl. We like using functions and not symbols.
+1 to following PEP 8. These aren't built-ins, it's a library module. In addition to the PEP, underscores make it much easier to read, especially for those for whom English is not their first language. =Tony.Meyer
At 11:25 AM 1/26/2006 +1300, Tony Meyer wrote:
"/" also works on Windows, and the Python distutils already set the precedent of requiring /-separated paths on *all* platforms, converting them to os.sep behind the scenes. I'd also note that using the / operator seems to me to be a big win on "beautiful is better than ugly". Path-joining code is mighty ugly without it, and / is more readable as well. It'd be nice to see the urllib modules grow a URL type supporting this operator, among other path operators. I would also suggest that as with the individual posixpath, ntpath, etc. libraries today, we should be able to import NTPath and PosixPath classes directly from those modules, for code that needs to manipulate a path for some system other than the one it's running on.
Phillip J. Eby <pje@telecommunity.com> wrote:
I'd also note that using the / operator seems to me to be a big win on "beautiful is better than ugly".
It screams "magic" in a very un-Pythonic (and possibly very Perl-like) way. I'm not aware of any other part of the standard library grossly abusing standard operators in this way. As others have noted, "/" is being used here to mean precisely the opposite of what it means in every other use in Python, which alone should be justification for getting rid of it. Charles -- ----------------------------------------------------------------------- Charles Cazabon <python@discworld.dyndns.org> GPL'ed software available at: http://pyropus.ca/software/ -----------------------------------------------------------------------
[Charles Cazabon]
I think the use of the modulo operator for string substitution is pretty comparable, despite it being in the interpreter rather than in the stdlib. And some of us have come to love that, too. -- Michael Hoffman <hoffman@ebi.ac.uk> European Bioinformatics Institute
Tony Meyer wrote:
If it were possible to use .join() for joining paths, I think I wouldn't mind so much. But reusing a string method for something very different seems like a bad idea. So we're left with .joinpath(). Still better than os.path.join() I guess, but only a little. I guess that's why I'm +1 on /.
I think / is pretty. I think it reads well. There's already some inevitable redundancy in this interface. I use os.path.join so much that I know anything I use will feel readable quickly, but I also think I'll find / more appealing.
I believe all platforms support /; at least Windows and Mac do, in addition to their native separators. I assume any platform that supports filesystem access will support / in Python. If anything, a good shortcut for .joinpath() will at least encourage people to use it, thus discouraging hardcoding of path separators. I expect it would encourage portable paths. Though Path('/foo') / '/bar' == Path('/bar'), which is *not* intuitive, though in the context of "join" it's not as surprising. So that is a problem. If / meant "under this path" then that could be a useful operator (in that I'd really like such an operator or method). Either paths would be forced to be under the original path, or it would be an error if they somehow escaped. Currently there's no quick-and-easy way to ensure this, except to join the paths, do abspath(), then confirm that the new path starts with the old path.
A little too heavy on the truisms. Python isn't the anti-Perl.
I don't find abs_path() much easier to read than abspath() -- neither is a full name. absolute_path() perhaps, but that is somewhat redundant; absolute()...? Eh. Precedence in naming means something, and in this case all the names have existed for a very long time (as long as Python?) PEP 8 encourages following naming precedence. While I don't see a need to match every existing function with a method, to the degree they do match I see no reason why we shouldn't keep the names. And I see reasons why the names shouldn't be changed. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
[Ian Bicking]
Why does reusing a string method for something very different seem like a bad idea, but reusing a mathematical operator for something very different seem like a good idea? Path's aren't strings, so join () seems the logical choice. (There are also alternatives to "joinpath" if the name is the thing: add(), for example). [Tony Meyer]
I suppose that the only beholder's eye that matters here is Guido's. (It still violates explicit/implicit and only-one-way. Not rules, of course, but good guidelines).
There's already some inevitable redundancy in this interface.
That's hardly a reason to encourage *more*. If anything, it's a reason to try for less, where possible.
This is not strictly true. Using '/' can lead to strange results with Windows, where it gets interpreted as a flag instead. It's not reliable, it's not the path separator that Windows users/developers understand, and it's not the correct (i.e. according to Microsoft) path separator. If by Mac you mean OS X, then that's just another *nix based OS. I'm pretty sure that pre OS X (which isn't supported any more anyway, right?) '/' was not, in fact, supported, and that ":" was required. I also believe it's important to remember that Windows and *nix descendants are not "all platforms".
I'm not sure that I believe that the reason that people don't type "os.path.join('a', 'b')" is because they are too lazy to type it. However, I don't have any evidence either way, so it could be true. [re: PEP8 following]
PEP 8 encourages following naming precedence within a module, doesn't it? Guido has said that he'd like to have the standard library tidied up, at least somewhat (e.g. StringIO.StringIO -> stringio.StringIO) for Python 3000. It would make it less painful if new additions already followed the plan. =Tony.Meyer
On Jan 25, 2006, at 3:42 PM, Tony Meyer wrote:
join() is already defined for strings, division is not. Different namespace... just like + is concatenation for list+list, tuple+tuple, basestring+basestring, but it's addition for numbers...
Mac OS X understands '/' as the separator at the POSIX layer, but ':' as the path separator at the Carbon API (which is only used in obscure places from Python). Earlier versions of Mac OS are no longer supported, and you're right -- they only understood ':' as a path separator.
In many cases, when I know I only care about *nix, I will use 'a/b' instead of os.path.join because it's just so much more concise and more obvious. The only times I use os.path.join are when I don't know if there will be a trailing slash or not, or if I'm actively trying to make something cross-platform. -bob
On Thu, 26 Jan 2006, Tony Meyer wrote: [...]
That's easy -- it's because, if you're going to use a name, people expect (with some level of trust) that you'll pick a good one. But people understand that there are only a few operators to use, so the meaning of operators is naturally more overloaded than that of method names. John
Tony Meyer wrote:
Paths are strings, that's in the PEP. As an aside, I think it should be specified what (if any) string methods won't be inherited by Path (or will be specifically disabled by making them throw some exception). I think .join() and __iter__ at least should be disabled.
I think the use of underscores or squished words isn't as bit a deal as the case of modules. It's often rather ambiguous what a "word" really is. At least in English word combinations slowly and ambiguously float towards being combined. So abspath and abs_path both feel sufficiently inside the scope of PEP 8 that precedence is worth maintaining. rfc822's getallmatchingheaders method was going too far, but a little squishing doesn't bother me, if it is consistent (and it's actually easier to be consistent about squishing than underscores). -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
On Wed, 2006-01-25 at 18:10 -0600, Ian Bicking wrote:
Whenever I see derived classes deliberately disabling base class methods, I see red flags that something in the design of the hierarchy isn't right. While I understand that you want to be able to use Path instances anywhere a string is currently used, I'm not sure that deriving from str is the right thing. Maybe deriving from basestring would be better, but even then I'm not sure. Is it possible that we don't need Path objects to interchangeable with strings, but just that we can get away with expanding a few critical functions (such as open())?
For something like "abspath" which is really an abbreviation + word, I suppose squishing them isn't so bad. The alternative is absolute_path() which is certainly more readable if a bit of a pain to write. It's a trade-off that should be made for practical purposes. I've definitely come to prefer spellings like is_absolute over isabsolute, and in general dislike squish words. -Barry
Barry Warsaw wrote:
IMHO the hierarchy problem is a misdesign of strings; iterating over strings is usually a bug, not a deliberately used feature. And it's a particularly annoying bug, leading to weird results. In this case a Path is not a container for characters. Strings aren't containers for characters either -- apparently they are containers for smaller strings, which in turn contain themselves. Paths might be seen as a container for other subpaths, but I think everyone agrees this is too ambigous and implicit. So there's nothing sensible that __iter__ can do, and having it do something not sensible (just to fill it in with something) does not seem very Pythonic. join is also a funny method that most people wouldn't expect on strings anyway. But putting that aside, the real issue I see is that it is a miscognate for os.path.join, to which it has no relation. And I can't possibly imagine what you'd use it for in the context of a path. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org
On Wed, 2006-01-25 at 22:30 -0600, Ian Bicking wrote:
Agreed. I've written iteration code that has to special case basestrings before and that's particularly ugly.
Good points, but to me that argues against having any inheritance relationship between strings and Paths rather than having such a relationship and disabling certain methods. Thomas's post seemed more on-target for me and I'd like to see that idea fleshed out in more detail. If it's proved to be an impossible (or merely an extremely infeasible) task, then I think we can discuss the shortcut of deriving from strings. It just seems gross so I'd like to be sure there's no better way. -Barry
On Thu, 26 Jan 2006, Tony Meyer wrote: [...]
<bikeshed> FWLIW, I'm definitely +1 on using / as a path join operator.
* It's being used to mean "join", which is the exact opposite of /'s other meaning ("divide").
But it's a very readable way to write a common operation. Perhaps one reason the discrepancy you point out doesn't bother me is that division is the least-used of the +-*/ arithmetic operations. Also, &, | and ^ seem like some sort of precedent, to my brain (if they don't to yours, that's fine and I believe you ;-).
* Python's not Perl. We like using functions and not symbols.
I think this is a tasteful, if not parsimonious, use of a symbol. </bikeshed> John
[John J Lee]
Do you have evidence to back that up? It seems a strange claim. Outside of doing 'maths-y' work, I would think I'd use + most (but for strings), then / (for percentages).
I don't follow this, sorry. You're referring to the bitwise operations? [Ian Bicking]
The problem with these sorts of guesses is that there's no evidence. (Maybe the suggestion that Brett's PhD should collect a corpus of Python scripts was a good one <wink>). Are mathematicians that under represented? Is file processing that highly represented? I have no idea. =Tony.Meyer
[John J Lee]
[Tony Meyer]
Do you have evidence to back that up?
No. :) [Ian Bicking]
of mine, and in 12k lines there were 34 uses of join, and 1 use of division. In smaller scripts os.path.join tends to show up a lot more
[Tony]
A second data point: I looked at ~10k lines of physical data analysis code I have lying around -- presumably a relatively rare and extreme example as the Python-world in general goes. Result: 140 occurences of os.path.join 170 physical lines (as opposed to syntactical lines) containing / as a division operator (very few lines contained > 1 use of '/', so you can multiply 170 by 1.25 to get an upper bound of 213 uses in total) (To get the second number, I used find and grep heavily but very cautiously, and final manual count of stubborn lines of grep output with no use of '/' as division operator) The fact that even in this extreme case os.path.join is close on the tail of '/' strongly backs up Ian's guess that, in most Python code, / as division is rare compared to path joining. Should we deprecate use of '/' and '//' for division in Python 3.0? is-he-joking?-ly y'rs John
On Thu, 2006-01-26 at 12:51 +1300, Tony Meyer wrote:
I haven't followed the entire thread (I'll try to find time to catch up) but while I think using __div__ to mean path concatenation is cute, I'm not sure I'd like to see it all over the place. It does seem awfully "FAST" ("facinating and stomach turning" to use a term from years ago). What I don't like about os.path.join() having to import os and having to type all those characters over and over again. What I /like/ about os.path.join is that you can give it a bunch of path components and have it return the correctly joined path, e.g. os.path.join('a, 'b', 'c'). That seems more efficient than having to create a bunch of intermediate objects. All in all, I'd have to say I'm -0 on __div__ for path concatenation. -Barry
I think that everything that can be said aboud __div__() has already been said. But this argument was really convincing: [Tony Meyer]
The vast majority of people (at least at the time) were either +0 or -0, not +1. +0's are not justification for including something.
There is no clear consensus either way. Ultimately, Guido will decide if he thinks it is clever or not. Meanwhile I'll remove it from the PEP and keep it as an "optional extension." Also, like Jason said, the removal of __div__() leads to the ultimate demise of joinpath(), woho! [Jason Orendorff]
Path.cwd() / "foobar" ==> Path(Path.cwd(), "foobar") Path("foo") / "bar" / "baz" ==> Path("foo", "bar", "baz") Still, in the simpler cases, __div__() looks really handy: os.chdir(pkgdir / "include") ==> os.chdir(Path(pkgdir, "include")) Oh well. You can't have everything, can you? The updated PEP and an implementation is availible from http://wiki.python.org/moin/PathClass. -- mvh Björn
John J Lee wrote:
My only fear with the / operator is that we'll end up with the same problems we have for using % in string formatting -- the order of operations might not be what users expect. Since join is conceptually an addition-like operator, I would expect: Path('home') / 'a' * 5 to give me: home/aaaaa If I understand it right, it would actually give me something like: home/ahome/ahome/ahome/ahome/a I don't want to claim this is the most common use case, but I've certainly seen auto-generated paths that look like 'a' * 20, and it would be a pity if using the / operator for Path objects did the wrong thing by default here... STeVe -- You can wordify anything if you just verb it. --- Bucky Katt, Get Fuzzy
Steven Bethard wrote:
What if we used "subpath" as the name instead of joinpath? The main appeal to me of the division operation is that it allows multiple path elements to be joined on a single line, but the joining method accepts an arbitrary number of arguments, which helps with that just as much, and doesn't raise precedence and readability questions. The above example would be: Path('home').subpath('a'*5) An example of retrieving a config file's full name: Current: os.path.join(HOME_DIR, APP_DIR, CONFIG_FILE) Division: HOME_DIR / APP_DIR / CONFIG_FILE Subpath: HOME_DIR.subpath(APP_DIR, CONFIG_FILE) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
Steven Bethard wrote:
Both of these examples are rather silly, of course ;) There's two operators currently used commonly with strings (that I assume Path would inherit): + and %. Both actually make sense with paths too. filename_template = '%(USER)s.conf' p = Path('/conf') / filename_template % os.environ which means: p = (Path('/conf') / filename_template) % os.environ But probably the opposite is intended. Still, it will usually be harmless. Which is sometimes worse than usually harmful. + seems completely innocuous, though: ext = '.jpg' name = fields['name'] image = Path('/images') / name + ext It doesn't really matter what order it happens in there. Assuming concatenation results in a new Path object, not a str. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org
On Wed, 2006-01-25 at 21:02 -0600, Ian Bicking wrote:
Here's a good example of why I ultimately don't like __div__. The last line seems quite non-obvious to me. It's actually jarring enough that I have to stop and think about what it means because it /looks/ like there's math going on. OTOH, something like: image = Path('', 'images', name) + ext or even better image = Path.join('', 'images', name) + ext where .join is a staticmethod, seems much clearer. -Barry
"Steven" == Steven Bethard <steven.bethard@gmail.com> writes:
Steven> My only fear with the / operator is that we'll end up with Steven> the same problems we have for using % in string formatting Steven> -- the order of operations might not be what users expect. Besides STeVe's example, (1) I think it's logical to expect that Path('home') / 'and/or' points to a file named "and/or" in directory "home", not to a file named "or" in directory "home/and". (2) Note that '/' is also the path separator used by URIs, which RFC 2396 gives different semantics from Unix. Most of my Python usage to date has been heavily web-oriented, and I'd have little use for / unless it follows RFC 2396. By that, I mean that I would want the / operator to treat its rhs as a relative reference, so the result would be computed by the algorithm in section 5.2 of that RFC. But this is not correct (realpath) semantics on Unix. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
It's controversial that Path subclasses str. Some people think it's flat-out wrong. Even Bjorn argues that it's a practicality-vs-purity tradeoff. But a strong argument can be made that Path *should* be a string subclass, practicality be damned. Proof follows. I. Here's an example of the sort of thing you might say if you did *not* think of paths as strings: On 1/25/06, Stephen J. Turnbull <stephen@xemacs.org> wrote:
This makes no sense whatsoever. Ergo, by reductio ad absurdum, paths are strings. II. And here is the sort of thing you'd say if you thought of paths *solely* as strings:
The quandary is resolved by pointing out that URIs are not paths (in the sense of os.path and generally this whole horrible thread). Thus not all strings are paths. Hence the paths are a proper subset of the strings. By the existence of os.path, they have their own commonly-used operations. By definition, then, Path is a subclass of string, QED. Do I really buy all this? I dunno. To say "paths aren't strings" is all very well, and in a very abstract sense I almost agree--but you have to admit it sort of flies in the face of, you know, reality. Filesystem paths are in fact strings on all operating systems I'm aware of. And it's no accident or performance optimization. It's good design. -j
On Fri, Jan 27, 2006 at 06:19:52PM -0500, Jason Orendorff wrote:
The question isn't whether Path objects should *act* like strings. I haven't seen anyone argue that they shouldn't, except for a few specific aspects, like iteration, and those are argued on both sides of the subclassing camp. The question is whether they should be actual subclasses of the Python string type. As for what platforms do, if we want to stick to the platform handling of paths, we change nothing. That's apparently not what people want ;) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
[Jason Orendorff]
Isn't that simply because filesystems aren't object orientated? I can't call methods of a path through the filesystem. There's a difference between a path, which is, yes, always (?) a string, and a Path object that provides convenient methods/properties. (Maybe one of the experimental object-orientated file systems has non- string paths. I have no idea). =Tony.Meyer
On 1/27/06, Jason Orendorff <jason.orendorff@gmail.com> wrote:
It makes perfect sense to me. However, since posix doesn't permit '/' in file names I would expect it to emit an error: Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. However, I'm not sure if the error be emitted when the Path is created, or when it's passed to open(). The former implies a set of OS-specific Path classes, and would allow subclassing from str. The latter allows (but does not require) a single universal Path class, and that would prohibit subclassing from str (because you need a higher-level representation to store path segments before converting them to a platform-specific format.) I'm -0 on subclassing str in the shortterm and -1 on it in the longterm. It's cruft and not something we want to be stuck with. -- Adam Olsen, aka Rhamphoryncus
"Jason" == Jason Orendorff <jason.orendorff@gmail.com> writes:
Jason> I. Here's an example of the sort of thing you might say if Jason> you did *not* think of paths as strings: [...] Jason> II. And here is the sort of thing you'd say if you thought Jason> of paths *solely* as strings: Please note that my point was entirely different from trying to decide whether to subclass strings. My point was precisely that because of this schizophrenia in the use of / as a path join operator in various string representations of paths, it's a bad choice. People are naturally going to write buggy code because they don't have the implemented spec in mind. Jason> Filesystem paths are in fact strings on all operating Jason> systems I'm aware of. I have no idea what you could mean by that. The data structure used to represent a filesystem on all OS filesystems I've used is a graph of directories and files. A filesystem object is located by traversing a path in that graph. Of course there's a string representation, especially for human use, but manipulating that representation as a string in programs is a regular source of bugs. In most cases, the graph is sufficiently constrained that string manipulations are mostly accurate representations of graph traversal, but not always, and you get defects. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
Stephen J. Turnbull:
Not always. IIRC very old MacOS used an integer directory ID and a string file name. The directory ID was a cookie that you received from the UI and passed through to the file system and there was little support for manipulating the directory ID. Textualized paths were never supposed to be shown to users. Neil
On 1/28/06, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Please note that my point was entirely different from trying to decide whether to subclass strings.
Noted -- sorry I took you out of context there; that was careless.
You seem to think that because I said "operating systems", I'm talking about kernel algorithms and such. I'm not. By "on all operating systems" I really mean systems, not kernels: system APIs, standard tools, documentation, the conventions everyone follows--that sort of thing. Userspace. Thought experiment: How are filesystem paths used? Well, programs pass them into system calls like open() and chmod(). Programs use them to communicate with other programs. Users pass them to programs. Compare this to how you'd answer the question "How are integers used?": I think paths are used more for communication, less for computation. Their utility for communication is tightly bound to their string-nature. Essentially all APIs involving filesystem paths treat them as strings. It's not just that they take string parameters. The way they're designed, they encourage users to think of paths as strings, not graph-paths. Java's stdlib is the only API that even comes close to distinguishing paths from strings. The .NET class library doesn't bother. Many many people much smarter than I have thought about creating non-string-oriented filesystem APIs. Somehow it hasn't caught on. Essentially all users expect to see a filesystem path as a string of characters in the conventional format. Display it any other way (say, as a sequence of edge-names) and you risk baffling them. My position is (a) the convention that paths are strings really does exist, embedded in the design and culture of the dominant operating systems--in fact it's overwhelming, and I'm surprised anyone can miss it; (b) there might be a reason for it, even aside from momentum. -j
"Jason" == Jason Orendorff <jason.orendorff@gmail.com> writes:
Jason> You seem to think that because I said "operating systems", Jason> I'm talking about kernel algorithms and such. I can see how you'd get that impression, but it's not true. My reason for mentioning OS-level filesystem was to show that even in that limited domain, treating paths as strings leads to bugs. Jason> I think paths are used more for communication, less for Jason> computation. True. For that purpose it is absolutely essential to have a string represention. However, I was discussing the use of "/" to invoke path composition, which is a computation. Nobody will use that for communication (except to describe a path expression in graph theoretic terms), and I don't think it's a good idea to use that particular symbol for that operation. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
Steven Bethard wrote:
Is there any very deep magic that says / is the One True Operator to use for this, given that there's to be an operator for it? For instance, & has lower correct precedence (so that Path('home') & 'a'*5 does something less unexpected), and doesn't look quite so much as if it denotes arithmetic, and avoids semantic interference from the idea that "/" should divide things or make them smaller. (Though, for what it's worth, I think sticking another subdirectory onto a path *is* dividing and making smaller: think of a path as representing a subtree.) You do lose the pun on the Unix path separator, which is a shame. -- g
BJörn Lindqvist wrote:
Curious how often I use os.path.join and division, I searched a project of mine, and in 12k lines there were 34 uses of join, and 1 use of division. In smaller scripts os.path.join tends to show up a lot more (per line). I'm sure there's people who use division far more than I, and os.path.join less, but I'm guessing the majority of users are more like me. That's not necessarily a justification of / for paths, but at least this use for "/" wouldn't be obscure or mysterious after you get a little experience seeing code that uses it. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
On Wed, 2006-01-25 at 21:37 +0100, BJörn Lindqvist wrote:
<PEP8> Function Names Function names should be lowercase, with words separated by underscores as necessary to improve readability. mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility. Method Names and Instance Variables Use the function naming rules: lowercase with words separated by underscores as necessary to improve readability. </PEP8> It is very clear. Whether you agree with PEP 8 or not is not relevant to this discussion. Since this is a completely new module, it should be correctly named from the start. The "familiarity with os.path argument" is a very weak one, IMHO. Plus, the names are full of redundancy. Why abspath(), joinpath(), realpath(), splitall()? Why not instead: absolute(), join(), real(), split() ? Remember that they are all methods of a Path class, you don't need to keep repeating 'path' all over the place[1]. On a slightly different subject, regarding path / path, I think it feels much more natural path + path. Path.join is really just a string concatenation, except that it adds a path separator in the middle if necessary, if I'm not mistaken. Best regards. [1] Yes, I'm the kind of guy who hates struct timeval having tv_sec and tv_usec field members instead of sec and usec. -- Gustavo J. A. M. Carneiro <gjc@inescporto.pt> <gustavo@users.sourceforge.net> The universe is always one step beyond logic
[Gustavo J. A. M. Carneiro]
+1 for all of those that aren't also string methods. +0.9 for those that are string methods, although I suppose this depends on how much like a string a Path ends up like. Other than join() (covered in the __div__ discussion), split() is an interesting case, since the default split-on-whitespace (str.split) doesn't make a whole lot of sense with a Path, but split-on-pathsep (os.path.split) does. Does it make sense to be able to split a path on something else (like str.split), or should people just convert to/ from a string? Should there be a maxsplit argument? =Tony.Meyer
Gustavo J. A. M. Carneiro wrote:
No, it isn't, which maybe is why / is bad. os.path.join(a, b) basically returns the path as though b is interpreted to be relative to a. I.e., os.path.join('/foo', '/bar') == '/bar'. Not much like concatenation at all. Plus string concatenation is quite useful with paths, e.g., to add an extension. If a URI class implemented the same methods, it would be something of a question whether uri.joinpath('/foo/bar', 'baz') would return '/foo/baz' (and urlparse.urljoin would) or '/foo/bar/baz' (as os.path.join does). I assume it would be be the latter, and urljoin would be a different method, maybe something novel like "urljoin". -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org
On Wed, 2006-01-25 at 22:35 -0600, Ian Bicking wrote:
os.path.join('/foo', '/bar') == '/bar'. Not much like concatenation at all.
Really? This is not like the unix command line. At least in Linux, /foo/bar is the same as /foo//bar and /foo///bar, etc. But I admit it can be useful in some cases.
Plus string concatenation is quite useful with paths, e.g., to add an extension.
I see your point. Although perhaps adding an extension to a file should be the exception and not the rule, since adding extensions is rarely used compared to joining paths? Maybe Path.add_extension() ? BTW, regarding Path subclassing basestr, there exists "prior art" for this Path thing in SCons. In SCons, we (users, I'm not a scons dev) have to constantly deal with Node instances. Most scons functions that accept Nodes also accept strings, but a Node is not a string. When calling an os function with Nodes, one has to convert it to string first, using str(). IMHO, having to decorate Node instances with str() sometimes is no big deal, really. And, given enough time, perhaps most of the python standard library could be enhanced to accept Path instances in addition to C strings.
I honestly don't understand the usefulness of join('/foo/bar', 'baz') ever returning '/foo/baz' instead of '/foo/bar/baz'. How would the former be of any use? If it has no use, then please don't complicate things even more :) Regards. -- Gustavo J. A. M. Carneiro <gjc@inescporto.pt> <gustavo@users.sourceforge.net> The universe is always one step beyond logic.
On Thu, 2006-01-26 at 16:17 +0100, Fredrik Lundh wrote:
That's not how I see it. A web browser, in order to resolve the link 'baz' in the page '/foo/bar', should do: join(basename('/foo/bar'), 'baz') == join('/foo', 'baz') == '/foo/baz'. Regards. -- Gustavo J. A. M. Carneiro <gjc@inescporto.pt> <gustavo@users.sourceforge.net> The universe is always one step beyond logic.
On Wed, Jan 25, 2006 at 09:37:04PM +0100, BJörn Lindqvist wrote:
Inheritance from string (Jason)
This is my only problem with the PEP. It's all very nice that subclassing from string makes it easier not to break things, but subclassing implies a certain relationship. That relationship doesn't exist, in this case. Having the string subclass behave differently than strings (consider the __iter__ and join methods) is a bad idea. I can dish up a half dozen contrived problem cases, but the main reason I don't like it is that it feels wrong. If the reason to subclass string is that it's too hard to make an object 'string-like' at a low enough level for the C API, I suggest fixing that, instead. If that means Path has to wait until Python 2.6, then that's too bad. The inability to feed C functions/types open() non-string objects has troubled me before, and though I haven't invested a lot of time in it, I don't quite understand why it isn't possible. Fixing it in a backward-compatible manner may require a new __hook__, although I think using __str__ or __unicode__ shouldn't be too problematic. Even if fixing the "%es"/"%et" etc formats to the arg-parsing methods is infeasible, I would prefer a new format code for 'string-alike as C string' over the unpythonic inappropriate subclassing. Although, even if the subclassing was completely appropriate, I would much rather improve builtin support for ducktyping than showcase subclassing ;) But perhaps I've missed that single, all-convincing argument that it has use subclassing... In that case, could it be added to the PEP? :) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
[Thomas Wouters]
This is the soul of arguing for purity's sake when practicality would dictate something else. If you remove the basestring superclass, then you remove the ability to use path objects as a drop-in replacement for any path string right now. You will either have to use str(pathobj) or carefully check that the function/framework you are passing the path to does not use isinstance() or any of the string methods that are now gone. http://groups.google.com/group/comp.lang.python/browse_thread/thread/1f5bcb6... -- Michael Hoffman <hoffman@ebi.ac.uk> European Bioinformatics Institute
On Thu, Jan 26, 2006 at 09:26:27AM +0000, Michael Hoffman wrote:
This is the soul of arguing for purity's sake when practicality would dictate something else.
If we're going to argue that point, I don't believe this is the practicality that the 'zen of python' talks about. Practicality is the argument for 'print', and for requiring the ':' before suites, and for some of the special cases in the Python syntax and module behaviour. It isn't about the implementation. The argument to subclass string is, as far as I can tell, only the ease of implementation and the ease of transition. Nothing in the old thread convinced me otherwise, either. I've never seen Guido go for an implementation-practical solution just because he couldn't be arsed to do the work to get a conceptually-practical solution. And subclassing string isn't conceptually-practical at all.
More to the point, you will have to carefully check whether the function/framework will use the Path object in a way the Path object can handle. There's already discussion about whether certain methods should be 'disabled', in Path objects, or whether they should be doing something conceptually different. And subclassing string is not going to solve all these issues anyway. Even in the standard library there's a scary amount of 'type(x) == type("")' checks, most noteably in exactly the type of function that takes both a filename and a file-like object. I don't believe going out of your way to cater to these kind of typechecks is the right way to solve the problem. I believe the problem is more that there isn't a unified, obvious, 'one-way' to do the typechecks -- or rather, to avoid the typechecks. Parts of Python do duck-typing and nothing else; this usually works fine, and is quite flexible. Other parts, especially (but not exclusively) the parts written in C and Python code that directly deals with those C parts, need to care more about actual types. Python offers no standard or accepted or even vaguely preferred way of doing that. The standard library doesn't even do this uniformly, so how can we expect anyone to ever get this 'right'? Especially since the 'right' way depends on what you want to do with the result. This problem pops up most visibly in treating-as-int (indexing) and in treating-as-string (open() and friends), but I'm sure there are more. You could see this as an argument for formal interfaces. Perhaps it is. It could also be an argument for just a few more __hooks__. Guido already suggested __index__, which would mean 'treat me as this int for indexing and other such operations' -- which is not the same thing as __int__. Likewise, treating-as-string would require a different hook than __str__ or __repr__, and should explicitly not be defined for most objects, only those that actually *want* to be treated as a string. And there should probably be more. Or, maybe, we could do it all in one __hook__. Say, __equivalent__. obj.__equivalent__(cls) would return the instance of 'cls' that represents 'obj', or raise an appropriate error. I say 'instance of 'cls'', which means it could be an instance of a subclass, too. It isn't duck-typing and it doesn't replace interfaces, not by far, since it actually returns a new type (and thus can't be a proxy-type; I don't think changing the equivalent should change the original, anyway.) Rather, it'd be the standard way to glue duck-typing (and formal interfaces, if you use those) with strict typing mechanisms (like C's.) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
[Thomas Wouters]
[Michael Hoffman]
This is the soul of arguing for purity's sake when practicality would dictate something else.
[Thomas Wouters]
I don't understand what "conceptually-practical" is or how it differs from "conceptually pure" which is what it seems that you're looking for. It's not hard to give Path a has-a relationship to basestring instead of an is-a relationship, so it really doesn't save much in terms of implementation.
Yes, and I think all of this discussion is focused on conceptual purity and misses one of the big wins of the Path module for current users--it can be trivially used anywhere where a str is expected today. If you're going to start deciding that certain str methods should be disabled for some reason, then it shouldn't be a str subclass, because it will no longer behave like string-plus. In previous discussions, string methods were identified that one programmer thought would not be useful on a path, but others disagreed. Disabling methods serves no useful purpose, except to shorten dir(). I've been using path.py for some time, and I can tell you that it would be a lot less useful if it no longer behaved like string-plus. -- Michael Hoffman <hoffman@ebi.ac.uk> European Bioinformatics Institute
Michael Hoffman wrote:
I've been using path.py for some time, and I can tell you that it would be a lot less useful if it no longer behaved like string-plus.
As Jason pointed out elsewhere, the strict typechecks that exist *in* the Python core, and the fact that path.py is *outside* that core makes the workaround of subclassing string necessary. Since the PEP process has the power to alter the *core*, then we have other options than "this is a string, only not". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
On 1/26/06, Thomas Wouters <thomas@xs4all.net> wrote:
I don't think there is consensus at all. I've seen plenty of arguments, either directly against inheritance from string, or against features which exist *because* of the inheritance (e.g., we can't use join() because it's a string method).
Agreed. Path objects don't feel like strings to me, either. It's certainly *arguable* that "paths are strings" in some ideal sense, but in a very practical sense they are not. Operations like split, join, justification, trimming, all of which are part of the Python string type (and hence constitute part of what it means to "be a string" in Python) do not have any sensible meaning on paths. The only justification for making Path a string subtype seems to me to avoid a few conversions - open(path) rather than open(str(path)), for example. I'd rather have to explicitly convert, to be honest. (But I'd happily accept changes to open etc to take path objects directly).
Adaptation (PEP 246) would let paths be *adaptable* to strings, without *being* strings... :-) Paul.
On 1/25/06, BJörn Lindqvist <bjourne@gmail.com> wrote:
I mind (see my previous post)...
Hardly. I've seen some pretty strong arguments (both for and against) - not what I'd describe as everyone saying they don't care. FWIW, I find the / operator ugly. Also, multiple concatenation (path / "a" / "b" / "c") results in building lots of intermediates, where path.join("a", "b", "c") need not. Arguing that you can't reuse string methods is bogus, IMHO, as the requirement to subclass from string is far from clear. Actually, reading that, I'd suggest: - an append() method to add extra components to a path - a multi-arg Path() constructor So, we have - path.append("a", "b", "c") - Path("C:", "Windows", "System32") Quick question - how do Path objects constructed from relative paths behave? Are there such things as relative path objects? Consider p1 = Path("a") p2 = Path("b") Is p1.append(p2) (or p1/p2) legal? What does it mean? I'd have to assume it's the same as Path("a", "b"), but I'm not sure I like that... What about Path("/a").append(Path("/b")) ??? Also note that my example Path("C:", "Windows", "System32") above is an *absolute* path on Windows. But a relative (albeit stupidly-named :-)) path on Unix. How would that be handled? Not that os.path gets it perfect: Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
But os.path manipulates strings representing pathnames (and I can forgive oddities like this by noting that some rules about pathnames are pretty subtle...). I'd have higher standards for a dedicated Path object. Arguably, Path objects should always maintain an absolute path - there should be no such thing as a relative Path. So you would have str(Path("whatever")) === os.path.abspath("whatever") It also allows Path("C:", "Windows") to do different things on Windows and Unix (absolute on Windows, relative to os.curdir on Unix). This would imply that Path("a", a_path) or a_path.append(another_path) is an error. And of course, for this to work, Path objects *can't* be a subclass of string... :-) Paul.
on 26.01.2006 14:15 Paul Moore said the following: [snip]
wrong, Path("C:", "Windows", "System32") is a relative path on windows. see below.
this is misleading. observe:: In [1]: import os In [2]: os.path.join(".", os.path.join("C:", "Windows", "System32")) Out[2]: '.\\C:Windows\\System32' but:: In [3]: os.path.join(".", os.path.join("C:\\", "Windows", "System32")) Out[3]: 'C:\\Windows\\System32' The second example uses an absolute path as second argument, and as os.path.join should do, the first argument is discarded. The first case is arguably a bug, since, on windows, C:Windows\System32 is a path relative to the *current directory on disk C:* If the cwd on C: would be C:\temp then C:Windows\System32 would point to C:\temp\Windows\System32 The problem is that Windows has a cwd per partition... (I cannot even guess why ;-) For the sake of illustration, the following is a WinXP cmd session:: Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\temp>d: D:\>cd HOME D:\HOME>c: C:\temp>d: D:\HOME>c: C:\temp>cd d:bin C:\temp>d: D:\HOME\bin> [snip]
Arguably, Path objects should always maintain an absolute path - there should be no such thing as a relative Path. So you would have
you realise that one might need and/or want to represent a relative path?
Stefan Rank wrote:
Of course, but it seems to me a relative path is a different type from an absolute path, in the same way that a timedelta is different from a datetime. For example: * You can't open a relative path without reference to some absolute path (possibly the cwd). * You can't join two absolute paths, but you can join a relative path to another relative path, or to an absolute path. Cheers, Aaron -------------------------------------------------------------------- Aaron Bingham Senior Software Engineer Cenix BioScience GmbH --------------------------------------------------------------------
on 26.01.2006 16:34 Aaron Bingham said the following:
I think the datetime/timedelta analogy is not bad: A datetime is actually also a time delta - relative to some given start-time, internally this is normally the "epoch". For human-readable representations it is the birth of your chosen deity, or the creation of the universe, ... The start time for datetime is implicit. Python has chosen some absolute reference. For paths that absolute reference (the root) is very much context dependent (platform dependent). You *can* open a relative path - because processes always have an implicit cwd as part of their context. But you might also want to represent paths that are relative on another host than the one your program is running on. I don't think it makes sense to design a Path class that captures the abstract concept of a path - because of the semantic differences between unix paths, windows paths, URL paths, ... I see the Path object as a special kind of string, that has helpful methods for relating it to the workings of filesystems in general and the local filesystem in particular. But it is still just an ordinary string that happens to be used as a special kind of address. I try to separate the concept of the 'object in the filesystem' (which is the domain of Python's file objects) from the 'hierarchical address to an object' (which is what the Path objects make easier). (Java has these two conflated in one.) So, to get to the point, a `file` is a thing that should always have an absolute path. (and it does. it should probably grow a .path attribute in addition to .name ? This could return None for files without paths.) A Path should be able to contain absolute, relative, valid, as well as invalid (on a given OS) paths. In case that future systems manage to reduce the importance of the legacy crutch that is the hierarchical filesystem ;-) we might get query-like addresses: '/tmp/[author=me;type=text/html]' and Path might grow to deal with it. Sorry I digress. +1 on Path as an enhanced string that bundles file-system-address related methods. stefan
On 1/26/06, Stefan Rank <stefan.rank@ofai.at> wrote:
Hmm, relative to the CWD on C: is a valid concept, and that is a potential meaning. I hadn't thought of that.
Thanks for the clarification, you are right in your analysis. However, it doesn't really affect my main point, which was that there should be no such thing as a relative Path (please note - I say "Path" here, to refer to the new Path object, as opposed to the general concept of an OS file path). [...]
Absolutely. But not a Path (see distinction above). Aaron Bingham's analogy with time/timedelta applies well here. Relative paths, like relative times, have their own special semantics, which deserve to be addressed in a separate class. You argue that time is "merely" a timedelta with a fixed start point. I'd disagree - the key point with timedeltas is that they need careful handling (DST issues, for example) _depending upon precisely what they are added to_ - these issues are avoided by the time type exactly because it has a constant base. In exactly the same way, absolute paths have simpler semantics precisely because they are absolute. Paul.
on 27.01.2006 11:16 Paul Moore said the following:
I see your point. I guess there are two options: - `Path` as an enhanced string type that bundles methods related to file system addressing - `Path`s that accurately reflect the possible (abstract) paths. There would be a Path and a PathDelta (with appropriate combining semantics), and probably a UnixPath, a WindowsPath, an URLPath maybe. And there need to be appropriate methods for combining them with/creating them from strings. I actually think the latter would be very neat and somewhere else in this thread someone talks about his `Tree` - `Path` - `File` classes with specialised subclasses. The first option, however, has the benefit of simplicity and there is a working implementation. Well - I'm not the one to decide. And I think anything that bundles path related stuff (using a little object-orientation) and cleans up the standard library is a step forward. cheers, s
I've submitted an updated version of the PEP. The only major change is that instead of the method atime and property getatime() there is now only one method named atime(). Also some information about the string inheritance problem in Open Issues. I still have no idea what to do about it though. -- mvh Björn
On 1 feb 2006, at 19:14, BJörn Lindqvist wrote:
The current PEP still contains some redundancy between properties and methods under Specifications: basename() <-> name basename(), stripext() <-> namebase splitpath() <-> parent, name (documented) I would like to suggest to use only properties and use splitall() to obtain a tuple with the complete breakdown of the path. And may be splitall() could then be renamed to split(). The directory methods mkdir()/makedirs() and rmdir()/removedirs() could be unified. To me it seems they only exist because of Un*x details. my $0.005 --eric
On Tuesday 24 January 2006 20:22, BJörn Lindqvist wrote:
This definition seems confusing because it splits the glob pattern string in two ('/lib', and '*.so'). Unless there is an intention to change the behavior of the current glob module, why not make the glob method parameterless: glob.glob("/lib/*.so") ==> Path("/lib/*.so").glob() Possible confusion with the one parameter version: Does glob matching happen on the first half too? That is, does Path('*').glob('*.so') match files in any directory, or only directories whose name is an asterisk. What behavior can I expect from Path('/foo/').glob(bar), where bar is some arbitrary string? It could be reasonable to expect that it would only match filenames inside the foo directory. However it could also be reasonable to expect to use bar=='/etc/*' -- Toby Dickenson
On 1/25/06, Toby Dickenson <tdickenson@devmail.geminidataloggers.co.uk> wrote:
Well, let's make this look more like real code: #line 1 LIB_DIR = "/lib" ==> LIB_DIR = Path("/lib") #line 296 libs = glob.glob(os.path.join(LIB_DIR, "*.so")) ==> libs = LIB_DIR.files("*.so") Clearer? In d.files(pattern), d is simply the root directory for the search. The same is true of all the searching methods: dirs(), walkfiles(), walkdirs(), etc. I actually never use path.glob(). For example, here files() is actually more accurate, and the word "files" is surely clearer than "glob". Given files(), dirs(), and listdir(), I have never found a real use case for glob(). -j
<Delurking> The path module has a great idea, but it does too much -- it conflates too many ideas into a single module. It has methods for dealing with files (open, bytes, read, etc) as well as methods for dealing with a filesystem tree as a whole (relpath, abspath, etc). Both of these ideas are tangentially related to paths, but really aren't in the same conceptual box. Not too long ago, I had to build something loosely based upon the path module. Instead of using it as-is, I broke it up into three modules: Tree (filesystem interfaces) Path (*just* path interfaces) File (a generic filelike object) Doing it this way had two benefits: First, it put different concepts into different modules. I note that some other virtual filesystem modules also maintedned a similar separation - probably for similar reasons. Second, I was able to define an interface which could be used across remote systems -- e.g. I was able to have an FTPTree (built on the standard library ftplib) and SSHTree (built upon paramiko) as well as FileTree (a standard filesystem). This isn't full-fledged interfaces - I just implemented common functionality in a class and then delegated to a ._ops class which passed through the necessary operations. However, I was able to use the various trees and file-like objects interchangeably.
One other benefit that I neglected to put into the previous post - I was able to maintain separate cwd's for each tree. An example of use: Each tree has its own context, independent of the context of python:
Remote trees have the same interface:
Trees can interact, regardless of whether they are local or remote:
participants (27)
-
Aaron Bingham
-
Adam Olsen
-
Barry Warsaw
-
BJörn Lindqvist
-
Bob Ippolito
-
Charles Cazabon
-
Eric Nieuwland
-
Fredrik Lundh
-
Gareth McCaughan
-
Gustavo J. A. M. Carneiro
-
Ian Bicking
-
Jason Orendorff
-
John J Lee
-
Michael Hoffman
-
Neil Hodgson
-
Nick Coghlan
-
Oleg Broytmann
-
Paul Moore
-
Phillip J. Eby
-
Stefan Rank
-
Stephen J. Turnbull
-
Steven Bethard
-
Thomas Wouters
-
Toby Dickenson
-
Tony Meyer
-
Trent Mick
-
VanL