
What's the status of PEP 355, Path - Object oriented filesystem paths? We'd like to start using the current reference implementation but we'd like to do it in a manner that minimizes any changes needed when Path becomes part of stdlib. In particular, the reference implementation in http://wiki.python.org/moin/PathModule names the class 'path' instead of 'Path', which seems like a source of name conflict problems. How would you recommend one starts using it now, as is or renaming class path to Path? Thanks -- Luis P Caamano Atlanta, GA USA

I would recommend not using it. IMO it's an amalgam of unrelated functionality (much like the Java equivalent BTW) and the existing os and os.path modules work just fine. Those who disagree with me haven't done a very good job of convincing me, so I expect this PEP to remain in limbo indefinitely, until it is eventually withdrawn or rejected. --Guido On 9/29/06, Luis P Caamano <lcaamano@gmail.com> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Fri, 29 Sep 2006 12:38:22 -0700, Guido van Rossum <guido@python.org> wrote:
Personally I don't like the path module in question either, and I think that PEP 355 presents an exceptionally weak case, but I do believe that there are several serious use-cases for "object oriented" filesystem access. Twisted has a module for doing this: http://twistedmatrix.com/trac/browser/trunk/twisted/python/filepath.py I hope to one day propose this module as a replacement, or update, for PEP 355, but I have neither the time nor the motivation to do it currently. I wouldn't propose it now; it is, for example, mostly undocumented, missing some useful functionality, and has some weird warts (for example, the name of the path-as-string attribute is "path"). However, since it's come up I thought I'd share a few of the use-cases for the general feature, and the things that Twisted has done with it. 1: Testing. If you want to provide filesystem stubs to test code which interacts with the filesystem, it is fragile and extremely complex to temporarily replace the 'os' module; you have to provide a replacement which knows about all the hairy string manipulations one can perform on paths, and you'll almost always forget some weird platform feature. If you have an object with a narrow interface to duck-type instead; for example, a "walk" method which returns similar objects, or an "open" method which returns a file-like object, mocking the appropriate parts of it in a test is a lot easier. The proposed PEP 355 module can be used for this, but its interface is pretty wide and implicit (and portions of it are platform-specific), and because it is also a string you may still have to deal with platform-specific features in tests (or even mixed os.path manipulations, on the same object). This is especially helpful when writing tests for error conditions that are difficult to reproduce on an actual filesystem, such as a network filesystem becoming unavailable. 2: Fast failure, or for lack of a better phrase, "type correctness". PEP 355 gets close to this idea when it talks about datetimes and sockets not being strings. In many cases, code that manipulates filesystems is passing around 'str' or 'unicode' objects, and may be accidentally passed the contents of a file rather than its name, leading to a bizarre failure further down the line. FilePath fails immediately with an "unsupported operand types" TypeError in that case. It also provides nice, immediate feedback at the prompt that the object you're dealing with is supposed to be a filesystem path, with no confusion as to whether it represents a relative or absolute path, or a path relative to a particular directory. Again, the PEP 355 module's subclassing of strings creates problems, because you don't get an immediate and obvious exception if you try to interpolate it with a non-path-name string, it silently "succeeds". 3: Safety. Almost every web server ever written (yes, including twisted.web) has been bitten by the "/../../../" bug at least once. The default child(name) method of Twisted's file path class will only let you go "down" (to go "up" you have to call the parent() method), and will trap obscure platform features like the "NUL" and "CON" files on Windows so that you can't trick a program into manipulating something that isn't actually a file. You can take strings you've read from an untrusted source and pass them to FilePath.child and get something relatively safe out. PEP 355 doesn't mention this at all. 4: last, but certainly not least: filesystem polymorphism. For an example of what I mean, take a look at this in-development module: http://twistedmatrix.com/trac/browser/trunk/twisted/python/zippath.py It's currently far too informal, and incomplete, and there's no specified interface. However, this module shows that by being objects and not module-methods, FilePath objects can also provide a sort of virtual filesystem for Python programs. With FilePath plus ZipPath, You can write Python programs which can operate on a filesystem directory or a directory within a Zip archive, depending on what object they are passed. On a more subjective note, I've been gradually moving over personal utility scripts from os.path manipulations to twisted.python.filepath for years. I can't say that this will be everyone's experience, but in the same way that Python scripts avoid the class of errors present in most shell scripts (quoting), t.p.f scripts avoid the class of errors present in most Python scripts (off-by-one errors when looking at separators or extensions). I hope that eventually Python will include some form of OO filesystem access, but I am equally hopeful that the current PEP 355 path.py is not it.

glyph@divmod.com wrote:
+1 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

glyph@divmod.com writes:
I think I agree with this too. For another source of ideas there is the 'py.path' bit of the py lib, which, um, doesn't seem to be documented terribly well, but allows access to remote svn repositories as well as local filesytems (at least). Cheers, mwh -- 3. Syntactic sugar causes cancer of the semicolon. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html

OK. Pronouncement: PEP 355 is dead. The authors (or the PEP editor) can update the PEP. I'm looking forward to a new PEP. --Guido On 9/30/06, Michael Hudson <mwh@python.net> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
It would be terrific if you gave us some clue about what is wrong in PEP355, so that the next guy does not waste his time. For instance, I find PEP355 incredibly good for my own path manipulation (much cleaner and concise than the awful os.path+os+shutil+stat mix), and I have trouble understanding what is *so* wrong with it. You said "it's an amalgam of unrelated functionality", but you didn't say what exactly is "unrelated" for you. Giovanni Bajo

Giovanni Bajo wrote:
Things the PEP 355 path object lumps together: - string manipulation operations - abstract path manipulation operations (work for non-existent filesystems) - read-only traversal of a concrete filesystem (dir, stat, glob, etc) - addition & removal of files/directories/links within a concrete filesystem Dumping all of these into a single class is certainly practical from a utility point of view, but it's about as far away from beautiful as you can get, which creates problems from a learnability point of view, and from a capability-based security point of view. PEP 355 itself splits the methods up into 11 distinct categories when listing the interface. At the very least, I would want to split the interface into separate abstract and concrete interfaces. The abstract object wouldn't care whether or not the path actually existed on the current filesystem (and hence could be relied on to never raise IOError), whereas the concrete object would include the many operations that might need to touch the real IO device. (the PEP has already made a step in the right direction here by removing the methods that accessed a file's contents, leaving that job to the file object where it belongs). There's a case to be made for the abstract object inheriting from str or unicode for compatiblity with existing code, but an alternative would be to enhance the standard library to better support the use of non-basestring objects to describe filesystem paths. A PEP should at least look into what would have to change at the Python API level and the C API level to go that route rather than the inheritance route. For the concrete interface, the behaviour is very dependent on whether the path refers to a file, directory or symlink on the current filesystem. For an OO filesystem interface, does it really make sense to leave them all lumped into the one class with a bunch of isdir() and islink() style methods? Or does it make more sense to have a method on the abstract object that will return the appropriate kind of filesystem info object? If the latter, then how would you deal with the issue of state coherency (i.e. it was a file when you last touched it on the filesystem, but someone else has since changed it to a link)? (that last question actually lends strong support to the idea of a *single* concrete interface that dynamically responds to changes in the underlying filesystem). Another key difference between the two is that the abstract objects would be hashable and serialisable, as their state is immutable and independent of the filesystem. For the concrete objects, the only immutable part of their state is the path name - the rest would reflect the state of the filesystem at the current point in time. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

On Sun, 01 Oct 2006 13:56:53 +1000, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think that compatibility can be achieved by having a "pathname" string attribute or similar to convert to a string when appropriate. It's not like datetime inherits from str to facilitate formatting or anything like that.
In C, this is going to be really difficult. Existing C APIs want to use C functions to deal with pathnames, and many libraries are not going to support arbitrary VFS I/O operations. For some libraries, like GNOME or KDE, you'd have to use the appropriate VFS object for their platform.
I don't think returning different types of objects makes sense. This sort of typing is inherently prone to race conditions. If you get a "DirectoryPath" object in Python, and then the underlying filesystem changes so that the name that used to be a directory is now a file (or a device, or UNIX socket, or whatever), how do you change the underlying type?
In non-filesystem cases, for example the "zip path" case, there are inherent failure modes that you can't really do anything about (what if the zip file is removed while you're in the middle of manipulating it?) but there are actual applications which depend on the precise atomic semantics and error conditions associated with moving, renaming, and deleting directories and files, at least on POSIX systems. The way Twisted does this is that FilePath objects explicitly cache the results of "stat" and then have an explicit "restat" method for resychronizing with the current state of the filesystem. None of their methods for *manipulating* the filesystem look at this state, since it is almost guaranteed to be out of date :).
It doesn't really make sense to separate these to me; whenever you're serializing or hashing that information, the "mutable" parts should just be discarded.

On 9/30/06, Giovanni Bajo <rasky@develer.com> wrote:
Sorry, no time. But others in this thread clearly agreed with me, so they can guide you. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On 10/1/06, Guido van Rossum <guido@python.org> wrote:
I'd like to write a post mortem for PEP 355. But one important question that haven't been answered is if there is a possibility for a path-like PEP to succeed in the future? If so, does the path-object implementation have to prove itself in the wild before it can be included in Python? From earlier posts it seems like you don't like the concept of path objects, which others have found very interesting. If that is the case, then it would be nice to hear it explicitly. :) -- mvh Björn

BJörn Lindqvist wrote:
Let me take a crack at it - I'm always good for spouting off an arrogant opinion :) Part 1: "Amalgam of Unrelated Functionality" To me, the Path module felt very much like the "swiss army knife" anti-pattern - a whole lot of functions that had little in common other than the fact that paths were involved. More specifically, I think its important to separate the notion of paths as abstract "reference" objects from filesystem manipulators. When I call a function that operates on a path, I want to clearly distinguish between a function that merely does a transformation on the path string, vs. one that actually hits the disk. This goes along with the "principle of least surprise" - it should never be the case that I cause an i/o operation to occur when I wasn't expecting it. For example, a function that computes the parent directory of a path should not IMHO be a sibling of a function which tests for the existence or readability of a file. I tend to think of paths and filesystems as broken down into 3 distinct domains, which are locators, inodes, and files. I realize that not all file systems on all platforms use the term 'inode', and have somewhat different semantics, but they all have some object which fulfills that role. -- A locator is an abstract description of how to "get to" a resource. A file path is a "locator" in exactly the sense that a URL is. Locators need not refer to 'real' resources in order to be valid. A locator to a non-existent resource still maintains a consistent structure, and can be manipulated and transformed without ever actually dereferencing it. A locator does not, however, have any properties or attributes - you cannot tell, for example, the creation date of a file by looking at its locator. -- An inode is a descriptor that points to some actual content. It actually lives on the filesystem, and has attributes (such as creation data, last modified date, permissions, etc.) -- 'Files' are raw content streams - they are the actual bytes that make up the data within the file. Files do not have 'names' or 'dates' directly in of themselves - only the inodes that describe them do. Now, I don't insist that everyone in the world should classify things the way I do - I'm just describing how I see it. Were I to come up with my own path-related APIs, they would most likely be divided into 3 sub-modules corresponding to the 3 subdivisions listed above. I would want to make it clear that when you are operating strictly at the locator level, you aren't touching inodes or files; When you are operating at the inode level, you aren't touching file content. Part 2: Should paths be objects? I should mention that while I appreciate the power of OOP, I am also very much against the kind of OOP-absolutism that has been taught in many schools of software engineering in the last two decades. There are a lot of really good, formal, well-thought-out systems of program organization, and OOP is only one of many. A classic example is relational algebra which forms the basis for relational databased - the basic notion that all operations on tabular data can be "composed" or "chained" in exactly the way that mathematical formula can be. In relational algebra, you can take a view of a view of a view, or a subquery of a query of a view of a table, and so on. Even single, scalar values - such as the count of the number of results of a query - are of the same data type as a 'relation', and can be operated on as such, or fed as input to a subsequent operation. I bring up the example of relational algebra because it applies to paths as well: There is a kind of "path algebra", where an operation on a path results in another path, which can be operated on further. Now, one way to achieve this kind of path algebra is to make paths an object, and to overload the various functions and operators so that they, too, return paths. However, path algebra can be implemented just as easily in a functional style as in an object style. Properly done, a functional design shouldn't be significantly more bulky or wordy than an object design; The fact that the existing legacy API fails this test has more to do with history than any inherent advantages of OOP vs. functional style. (Actually, the OOP approach has a slight advantage in terms of the amount of syntactic sugar available, but that is [a] an artifact of the current Python feature set, and [b] not necessarily a good thing if it leads to gratuitous, Perl-ish cleverness.) As a point of comparison, the Java Path API and the C# .Net Path API have similar capabilities, however the former is object-based whereas the latter is functional and operates on strings. Having used both of them extensively, I find I prefer the C# style, mainly due to the ease of intra-conversion with regular strings - being able to read strings from configuration files, for example, and immediately operate on them without having to convert to path form. I don't find "p.GetParent()" much harder or easier to type than "Path.GetParent( p )"; but I do prefer "Path.GetParent( string )" over "Path( string ).GetParent()". However, this is only a *mild* preference - I could go either way, and wouldn't put up much of a fight about it. (I should not that the Java Path API does *not* follow my scheme of separation between locators and inodes, while the C# API does, which is another reason why I prefer the C# approach.) Part 3: Does this mean that the current API cannot be improved? Certainly not! I think everyone (well, almost) agrees that there is much room for improvement in the current APIs. They certainly need to be refactored and recategorized. But I don't think that the solution is to take all of the path-related functions and drop them into a single class, or even a single module. --- Anyway, I hope that (a) that answers your questions, and (b) isn't too divergent from most people's views about Path. -- Talin

(one additional postscript - One thing I would be interested in is an approach that unifies file paths and URLs so that there is a consistent locator scheme for any resource, whether they be in a filesystem, on a web server, or stored in a zip file.) -- Talin

Talin writes:
+1 But doesn't file:/// do that for files, and couldn't we do something like zipfile:///nantoka.zip#foo/bar/baz.txt? Of course, we'd want to do ziphttp://your.server.net/kantoka.zip#foo/bar/baz.txt, too. That way leads to madness....

Scott Dial writes:
It would make more sense to register protocol handlers to this magical unification of resource manipulation.
I don't think it's that magical, and it's not manipulation, it's location. The question is, register where and on what? For example on my Mac there are some PDFs I want to open in Preview and others in Acrobat. To the extent that I have some classes which are one or the other, I might want to register the handler to a wildcard path object.
But allow me to perform my first channeling of Guido.. YAGNI.
True, but only because when I do need that kind of stuff I'm normally writing Emacs Lisp, not Python. We have a wide variety of functions for manipulating path strings, and they make exactly the distinction between path and inode/content that Talin does (where a path is being manipulated, the function has "filename" in its name, where a file or its metadata is being accessed, the function's name contains "file"). Nonetheless there are two or three places where programmers I respect have chosen to invent path classes to handle hairy special cases. These classes are very useful in those special cases. One place where this gets especially hairy is in the TRAMP package, which allows you to construct "remote paths" involving (for example) logging into host A by ssh, from there to host B by ssh, and finally a "relay download" of the content from host C to the local host by scp. The net effect is that you can specify the path in your "open file" dialog, and Emacs does the rest automatically; the only differences the user sees between that and a local file is the length of the path string and the time it takes to actually access the contents. Once you've done that, that process is embedded into Emacs's notion of the "current directory", so you can list the directory containing the resource, or access siblings, very conveniently. I don't expect to reproduce that functionality in Python personally, but such use cases do exist. Whether a general path class can be invented that doesn't accumulate cruft faster than use cases is another issue.

Scott Dial wrote:
I'm thinking that it was a tactical error on my part to throw in the whole "unified URL / filename namespace" idea, which really has nothing to do with the topic. Lets drop it, or start another topic, and let this thread focus on critiques of the path module, which is probably more relevant at the moment. -- Talin

stephen@xemacs.org wrote:
file:/// does indeed to it, but only the network module understands strings in that format. Ideally, you should be able to pass "file:///..." to a regular "open" function. I wouldn't expect it to be able to understand "http://". But the "file:" protocol should always be supported. In other words, I'm not proposing that the built-in file i/o package suddenly grow an understanding of network schema types. All I am proposing is a unified name space. - Talin

Talin wrote:
Ideally, you should be able to pass "file:///..." to a regular "open" function.
I'm not so sure about that. Consider that "file:///foo.bar" is a valid relative pathname on Unix to a file called "foo.bar" in a directory called "file:". That's not to say there shouldn't be a function available that understands it, but I wouldn't want it built into all functions that take pathnames. -- Greg

Talin wrote:
+1 from me. (for both the fraction I quoted and everything else you said, including the locator/inode/file distinction - although I'd also add that 'symbolic link' and 'directory' exist at a similar level as 'file'). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Nick Coghlan wrote:
I would tend towards classifying directory operations as inode-level operations, that you are working at the "filesystem as graph" level, rather than the "stream of bytes" level. When you iterate over a directory, what you are getting back is effectively inodes (well, directory entries are distinct from inodes in the underlying filesystem, but from Python there's no practical distinction.) If I could draw a UML diagram in ASCII, I would have "inode --> points to --> directory or file" and "directory --> contains * --> inode". That would hopefully make things clearer. Symbolic links, I am not so sure about; In some ways, hard links are easier to classify. --- Having done a path library myself (in C++, for our code base at work), the trickiest part is getting the Windows path manipulations right, and fitting them into a model that allows writing of platform-agnostic code. This is especially vexing when you realize that its often useful to manipulate unix-style paths even when running under Win32 and vice versa. A prime example is that I have a lot of Python code at work that manipulates Perforce client specs files. The path specifications in these files are platform-agnostic, and use forward slashes regardless of the host platform, so "os.path.normpath" doesn't do the right thing for me.
Cheers, Nick.

On Wednesday 25 October 2006 13:16, Talin wrote:
Never heard of it. Its not in the standard library, is it? I don't see it in the table of contents or the index.
This is a documentation bug. :-( I'd thought they were mentioned *somewhere*, but it looks like I'm wrong. os.path is an alias for one of several different real modules; which is selected depends on the platform. I see the following: macpath, ntpath, os3emxpath, riscospath. (ntpath is used for all Windows versions, not just NT.) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>

At 10:16 AM 10/25/2006 -0700, Talin wrote:
posixpath, ntpath, macpath, et al are the platform-specific path manipulation modules that are aliased to os.path. However, each of these modules' string path manipulation functions can be imported and used on any platform. See below: Linux: Python 2.3.5 (#1, Aug 25 2005, 09:17:44) [GCC 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
Windows: Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information.
Note, therefore, that any "path object" system should also allow you to create and manipulate foreign paths. That is, it should have variants for each path type, rather than being locked to the local platform's path strings. Of course, the most common need for this is manipulating posix paths on non-posix platforms, but sometimes one must deal with Windows paths on Unix, too.

Talin wrote:
(Actually, the OOP approach has a slight advantage in terms of the amount of syntactic sugar available,
Even if you don't use any operator overloading, there's still the advantage that an object provides a namespace for its methods. Without that, you either have to use fairly verbose function names or keep qualifying them with a module name. Code that uses the current path functions tends to contain a lot of os.path.this(os.path.that(...)) stuff which is quite tedious to write and read. Another consideration is that having paths be a distinct data type allows for the possibility of file system references that aren't just strings. In Classic MacOS, for example, the definitive way of referencing a file is by a (volRefum, dirID, name) tuple, and textual paths aren't guaranteed to be unique or even to exist.
A compromise might be to have all the "path algebra" operations be methods, and everything else functions which operate on path objects. That would make sense, because the path algebra ought to be a closed set of operations that's tightly coupled to the platform's path semantics. -- Greg

Greg Ewing wrote:
Given the flexibility that Python allows in naming the modules that you import, I'm not sure that this is a valid objection -- you can make the module name as short as you feel comfortable with.
That's true of textual paths in general - i.e. even on unix, textual paths aren't guaranteed to be unique or exist. Its been a while since I used classic MacOS - how do you handle things like configuration files with path names in them?
Personally, this is one of those areas where I am strongly tempted to violate TOOWTDI - I can see use cases where string-based paths would be more convenient and less typing, and other use cases where object-based paths would be more convenient and less typing. If I were designing a path library, I would create a string-based system as the lowest level, and an object based system on top of it (the reason for doing it that was is simply so that people who want to use strings don't have to suffer the cost of creating temporary path objects to do simple things like joins.) Moreover, I would keep the naming conventions of the two systems similar, if at all possible possible - thus, the object methods would have the same (short) names as the functions within the module. So for example: # Import new, refactored module io.path from io import path # Case 1 using strings path1 = path.join( "/Libraries/Frameworks", "Python.Framework" ) parent = path.parent( path1 ) # Case 2 using objects pathobj = path.Path( "/Libraries/Frameworks" ) pathobj += "Python.Framework" parent = pathobj.parent() Let me riff on this just a bit more - don't take this all too seriously though: Refactored organization of path-related modules (under a new name so as not to conflict with existing modules): io.path -- path manipulations io.dir -- directory functions, including dirwalk io.fs -- dealing with filesystem objects (inodes, symlinks, etc.) io.file -- file read / write streams # Import directory module import io.dir # String based API for entry in io.dir.listdir( "/Library/Frameworks" ): print entry # Entry is a string # Object based API dir = io.dir.Directory( "/Library/Frameworks" ) for entry in dir: # Iteration protocol on dir object print entry # entry is an obj, but __str__() returns path text # Dealing with various filesystems: pass in a format parameter dir = io.dir.Directory( "/Library/Frameworks" ) print entry.path( format="NT" ) # entry printed in NT format # Or you can just use a format specifier for PEP 3101 string format: print "Path in local system format is {0}".format( entry ) print "Path in NT format is {0:NT}".format( entry ) print "Path in OS X format is {0:OSX}".format( entry ) Anyway, off the top of my head, that's what a refactored path API would look like if I were doing it :) (Yes, the names are bad, can't think of better ATM.) -- Talin

On Oct 25, 2006, at 10:48 PM, Talin wrote:
You aren't supposed to use paths at all. You're supposed to use an Alias whenever you're doing long term storage of a reference to a file. This allows the user to move the file around on the disk without breaking the reference, which is nice. The alias is an opaque datastructure which contains a bunch of redundant information used to locate the file. In particular, both pathname and (volumeId, dirId, name), as well as some other stuff like file size, etc. to help do fuzzy matching if the original file can't be found via the obvious locators. And for files on a file server, it also contains information on how to reconnect to the server if necessary. Much of the alias infrastructure carries over into OSX, although the strictures against using paths have been somewhat watered down. At least in OSX, you don't have the issue of the user renaming the boot volume and thus breaking every path someone ill-advisedly stored (since volume name was part of the path). For an example of aliases in OSX, open a file in TextEdit, see that it gets into the "recent items" menu. Now, move it somewhere else and rename it, and notice that it's still accessible from the menu. Seperately, try deleting the file and renaming another to the same name. Notice that it also succeeds in referencing this new file. Hm, how's this related to python? I'm not quite sure. :) James

Talin wrote:
That's true of textual paths in general - i.e. even on unix, textual paths aren't guaranteed to be unique or exist.
What I mean is that it's possible for two different files to have the same pathname (since you can mount two volumes with identical names at the same time, or for a file to exist on disk yet not be accessible via any pathname (because it would exceed 255 characters). I'm not aware of any analogous situations in unix.
Its been a while since I used classic MacOS - how do you handle things like configuration files with path names in them?
True native classic MacOS software generally doesn't use pathnames. Things like textual config files are really a foreign concept to it. If you wanted to store config info, you'd probably store an alias, which points at the moral equivalent of the files inode number, and use a GUI for editing it. However all this is probably not very relevant now, since as far as I know, classic MacOS is no longer supported in current Python versions. I'm just pointing out that the flexibility would be there if any similarly offbeat platform needed to be supported in the future.
I don't think that expressing one platform's pathnames in the format of another is something you can do in general, e.g. going from Windows to Unix, what do you do with the drive letter? You can only really do it if you have some sort of network file system connection, and then you need more information than just the path in order to do the translation. -- Greg

Greg Ewing wrote:
I'm not sure that PEP 355 included any such support - IIRC, the path object was a subclass of string. That isn't, however, a defense against what you are saying - just because neither the current system or the proposed improvement support the kinds of file references you are speaking of, doesn't mean it shouldn't be done. However, this does kind of suck for a cross-platform scripting language like Python. It means that any cross-platform app which requires access to multiple data files that contain inter-file references essentially has to implement its own virtual file system. (Python module imports being a case in point.) One of the things that I really love about Python programming is that I can sit down and start hacking on a new project without first having to go through an agonizing political decision about what platforms I should support. It used to be that I would spend hours ruminating over things like "Well...if I want any market share at all, I really should implement this as Windows program...but on the other hand, I won't enjoy writing it nearly as much." Then comes along Python and removes all of that bothersome hacker-angst. Because of this, I am naturally disinclined to incorporate into my programs any concept which doesn't translate to other platforms. I don't mind writing some platform-specific code, as long as it doesn't take over my program. It seems that any Python program that manipulated paths would have to be radically different in the environment that you describe. How about this: In my ontology of path APIs given earlier, I would tend to put the MacOS file reference in the category of "file locator schemes other than paths". In other words, what you are describing isn't IMHO a path at all, but it is like a path in that it describes how to get to a file. (Its almost like an inode or dirent in some ways.) An alternative approach is to try and come up with an encoding scheme that allows you to represent all of that platform-specific semantics in a string. This leaves you with the unhappy choice of "inventing" a new path syntax for an old platform. however.
Yeah, probably not. See, I told you not to take it too seriously! But I do feel that its important to be able to manipulate posix-style path syntax on non-posix platfosm, given how many cross-platform applications there are that have a cross-platform path syntax. In my own work, I find that drive letters are never explicitly specified in config files. Any application such as a parser, template generator, or resource manager (in other words, any application whose data files are routinely checked in to the source control system or shared across a network) tend to 'see' only relative paths in their input files, and embedding absolute paths is considered an error on the user's part. Of course, those same apps *do* internally convert all those relative paths to absolute, so that they can be compared and resolved with respect to some common base. Then again, in my opinion, the only *really* absolute paths are fully-qualified URLs. So there. :)

Talin wrote:
It seems that any Python program that manipulated paths would have to be radically different in the environment that you describe.
I can sympathise with that. The problem is really inherent in the nature of the platforms -- it's just not possible to do everything in a native classic MacOS way and be cross-platform at the same time. There has to be a compromise somewhere. With classic MacOS the compromise was usually to use pathnames and to heck with the consequences. You could get away with it most of the time.
Yes, that's true. Calling it a "path" would be something of a historical misnomer.
Yes, I thought of that, too. That's what you would have to do under the current scheme if you ever encountered a platform which truly had no textual representation of file locations. But realistically, it seems unlikely that such a platform will be invented in the foreseeable future (even classic MacOS *had* a notion of paths, even if it wasn't the preferred representation). So all this is probably YAGNI. -- Greg

BJörn Lindqvist wrote:
So...how's that post mortem coming along? Did you get a sufficient answer to your questions? And the more interesting question is, will the effort to reform Python's path functionality continue? From reading all the responses to your post, I feel that the community is on the whole supportive of the idea of refactoring os.path and friends, but they prefer a different approach; And several of the responses sketch out some suggestions for what that approach might be. So what happens next? -- Talin

On 10/28/06, Talin <talin@acm.org> wrote:
Yes and no. All posts have very exhaustively explained why the implementation in PEP 355 is far from optimal. And I can see why it is. However, what I am uncertain of is Guido's opinion on the background and motivation of the PEP: "Many have felt that the API for manipulating file paths as offered in the os.path module is inadequate." "Currently, Python has a large number of different functions scattered over half a dozen modules for handling paths. This makes it hard for newbies and experienced developers to to choose the right method." IMHO, the current API is very messy. But when it comes to PEPs, it is mostly Guido's opinion that counts. :) Unless he sees a problem with the current situation, then there is no point in writing more PEPs.
And the more interesting question is, will the effort to reform Python's path functionality continue?
I certainly hope so. But maybe it is better to target Python 3000, or maybe the Python devs already have ideas for how they want the path APIs to look like?
So what happens next?
I really hope that Guido will give his input when he has more time. Mvh Björn

BJörn Lindqvist wrote:
I think targeting Py3K is a good idea. The whole purpose of Py3K is to "clean up the messes" of past decisions, and to that end, a certain amount of backwards-compatibility breakage will be allowed (although if that can be avoided, so much the better.) And to the second point, having been following the Py3K list, I don't anyone has expressed any preconceived notions of how they want things to look (well, except I know I do, but I'm not a core dev :) :).
So what happens next?
I really hope that Guido will give his input when he has more time.
First bit of advice is, don't hold your breath. Second bit of advice is, if you really do want Guido's feedback (or the core python devs), start my creating a (short) list of the outstanding points of controversy to be resolved. Once those issues have been decided, then proceed to the next stage, building consensus by increments. Basically, anything that requires Guido to read more than a page of material isn't going to get done quickly. At least, in my experience :)
Mvh Björn

On 9/30/06, Giovanni Bajo <rasky@develer.com> wrote:
Here are my guesses. I believe Guido rejected this PEP for a lot of reasons. By the way, what I'm about to do is known as "channeling Guido (badly)" and I'm pretty sure it annoys him. Sorry, Guido. Please don't treat the following as authoritative; I have never met Guido and obviously I cannot speak for him. - I don't think Guido ever saw much benefit from "path objects". That is, the Motivation was not compelling. I think the main motivation is to eliminate some clutter and add a handful of useful methods to the stdlib, so it's easy to see how this could be the case. - Guido just flat-out didn't like the looks of the PEP. Too much weirdness. (path.py contains more weirdness, including some stuff Guido particularly disliked, and I think it's fair to say that PEP355 suffered somewhat by association.) - Any proposal to add a Second Way To Do It has to meet a very high standard. PEP355 was too big to be considered an incremental change. Yet it didn't even attempt to fix all the perceived problems with the existing APIs. A more thorough job would have had a better chance. - Nobody liked the API design--too many methods. - Now we're hearing rumors of better ideas out there, which comes as a relief. I suspect any one of these could have scuttled the proposal. -j

I would recommend not using it. IMO it's an amalgam of unrelated functionality (much like the Java equivalent BTW) and the existing os and os.path modules work just fine. Those who disagree with me haven't done a very good job of convincing me, so I expect this PEP to remain in limbo indefinitely, until it is eventually withdrawn or rejected. --Guido On 9/29/06, Luis P Caamano <lcaamano@gmail.com> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Fri, 29 Sep 2006 12:38:22 -0700, Guido van Rossum <guido@python.org> wrote:
Personally I don't like the path module in question either, and I think that PEP 355 presents an exceptionally weak case, but I do believe that there are several serious use-cases for "object oriented" filesystem access. Twisted has a module for doing this: http://twistedmatrix.com/trac/browser/trunk/twisted/python/filepath.py I hope to one day propose this module as a replacement, or update, for PEP 355, but I have neither the time nor the motivation to do it currently. I wouldn't propose it now; it is, for example, mostly undocumented, missing some useful functionality, and has some weird warts (for example, the name of the path-as-string attribute is "path"). However, since it's come up I thought I'd share a few of the use-cases for the general feature, and the things that Twisted has done with it. 1: Testing. If you want to provide filesystem stubs to test code which interacts with the filesystem, it is fragile and extremely complex to temporarily replace the 'os' module; you have to provide a replacement which knows about all the hairy string manipulations one can perform on paths, and you'll almost always forget some weird platform feature. If you have an object with a narrow interface to duck-type instead; for example, a "walk" method which returns similar objects, or an "open" method which returns a file-like object, mocking the appropriate parts of it in a test is a lot easier. The proposed PEP 355 module can be used for this, but its interface is pretty wide and implicit (and portions of it are platform-specific), and because it is also a string you may still have to deal with platform-specific features in tests (or even mixed os.path manipulations, on the same object). This is especially helpful when writing tests for error conditions that are difficult to reproduce on an actual filesystem, such as a network filesystem becoming unavailable. 2: Fast failure, or for lack of a better phrase, "type correctness". PEP 355 gets close to this idea when it talks about datetimes and sockets not being strings. In many cases, code that manipulates filesystems is passing around 'str' or 'unicode' objects, and may be accidentally passed the contents of a file rather than its name, leading to a bizarre failure further down the line. FilePath fails immediately with an "unsupported operand types" TypeError in that case. It also provides nice, immediate feedback at the prompt that the object you're dealing with is supposed to be a filesystem path, with no confusion as to whether it represents a relative or absolute path, or a path relative to a particular directory. Again, the PEP 355 module's subclassing of strings creates problems, because you don't get an immediate and obvious exception if you try to interpolate it with a non-path-name string, it silently "succeeds". 3: Safety. Almost every web server ever written (yes, including twisted.web) has been bitten by the "/../../../" bug at least once. The default child(name) method of Twisted's file path class will only let you go "down" (to go "up" you have to call the parent() method), and will trap obscure platform features like the "NUL" and "CON" files on Windows so that you can't trick a program into manipulating something that isn't actually a file. You can take strings you've read from an untrusted source and pass them to FilePath.child and get something relatively safe out. PEP 355 doesn't mention this at all. 4: last, but certainly not least: filesystem polymorphism. For an example of what I mean, take a look at this in-development module: http://twistedmatrix.com/trac/browser/trunk/twisted/python/zippath.py It's currently far too informal, and incomplete, and there's no specified interface. However, this module shows that by being objects and not module-methods, FilePath objects can also provide a sort of virtual filesystem for Python programs. With FilePath plus ZipPath, You can write Python programs which can operate on a filesystem directory or a directory within a Zip archive, depending on what object they are passed. On a more subjective note, I've been gradually moving over personal utility scripts from os.path manipulations to twisted.python.filepath for years. I can't say that this will be everyone's experience, but in the same way that Python scripts avoid the class of errors present in most shell scripts (quoting), t.p.f scripts avoid the class of errors present in most Python scripts (off-by-one errors when looking at separators or extensions). I hope that eventually Python will include some form of OO filesystem access, but I am equally hopeful that the current PEP 355 path.py is not it.

glyph@divmod.com wrote:
+1 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

glyph@divmod.com writes:
I think I agree with this too. For another source of ideas there is the 'py.path' bit of the py lib, which, um, doesn't seem to be documented terribly well, but allows access to remote svn repositories as well as local filesytems (at least). Cheers, mwh -- 3. Syntactic sugar causes cancer of the semicolon. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html

OK. Pronouncement: PEP 355 is dead. The authors (or the PEP editor) can update the PEP. I'm looking forward to a new PEP. --Guido On 9/30/06, Michael Hudson <mwh@python.net> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
It would be terrific if you gave us some clue about what is wrong in PEP355, so that the next guy does not waste his time. For instance, I find PEP355 incredibly good for my own path manipulation (much cleaner and concise than the awful os.path+os+shutil+stat mix), and I have trouble understanding what is *so* wrong with it. You said "it's an amalgam of unrelated functionality", but you didn't say what exactly is "unrelated" for you. Giovanni Bajo

Giovanni Bajo wrote:
Things the PEP 355 path object lumps together: - string manipulation operations - abstract path manipulation operations (work for non-existent filesystems) - read-only traversal of a concrete filesystem (dir, stat, glob, etc) - addition & removal of files/directories/links within a concrete filesystem Dumping all of these into a single class is certainly practical from a utility point of view, but it's about as far away from beautiful as you can get, which creates problems from a learnability point of view, and from a capability-based security point of view. PEP 355 itself splits the methods up into 11 distinct categories when listing the interface. At the very least, I would want to split the interface into separate abstract and concrete interfaces. The abstract object wouldn't care whether or not the path actually existed on the current filesystem (and hence could be relied on to never raise IOError), whereas the concrete object would include the many operations that might need to touch the real IO device. (the PEP has already made a step in the right direction here by removing the methods that accessed a file's contents, leaving that job to the file object where it belongs). There's a case to be made for the abstract object inheriting from str or unicode for compatiblity with existing code, but an alternative would be to enhance the standard library to better support the use of non-basestring objects to describe filesystem paths. A PEP should at least look into what would have to change at the Python API level and the C API level to go that route rather than the inheritance route. For the concrete interface, the behaviour is very dependent on whether the path refers to a file, directory or symlink on the current filesystem. For an OO filesystem interface, does it really make sense to leave them all lumped into the one class with a bunch of isdir() and islink() style methods? Or does it make more sense to have a method on the abstract object that will return the appropriate kind of filesystem info object? If the latter, then how would you deal with the issue of state coherency (i.e. it was a file when you last touched it on the filesystem, but someone else has since changed it to a link)? (that last question actually lends strong support to the idea of a *single* concrete interface that dynamically responds to changes in the underlying filesystem). Another key difference between the two is that the abstract objects would be hashable and serialisable, as their state is immutable and independent of the filesystem. For the concrete objects, the only immutable part of their state is the path name - the rest would reflect the state of the filesystem at the current point in time. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

On Sun, 01 Oct 2006 13:56:53 +1000, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think that compatibility can be achieved by having a "pathname" string attribute or similar to convert to a string when appropriate. It's not like datetime inherits from str to facilitate formatting or anything like that.
In C, this is going to be really difficult. Existing C APIs want to use C functions to deal with pathnames, and many libraries are not going to support arbitrary VFS I/O operations. For some libraries, like GNOME or KDE, you'd have to use the appropriate VFS object for their platform.
I don't think returning different types of objects makes sense. This sort of typing is inherently prone to race conditions. If you get a "DirectoryPath" object in Python, and then the underlying filesystem changes so that the name that used to be a directory is now a file (or a device, or UNIX socket, or whatever), how do you change the underlying type?
In non-filesystem cases, for example the "zip path" case, there are inherent failure modes that you can't really do anything about (what if the zip file is removed while you're in the middle of manipulating it?) but there are actual applications which depend on the precise atomic semantics and error conditions associated with moving, renaming, and deleting directories and files, at least on POSIX systems. The way Twisted does this is that FilePath objects explicitly cache the results of "stat" and then have an explicit "restat" method for resychronizing with the current state of the filesystem. None of their methods for *manipulating* the filesystem look at this state, since it is almost guaranteed to be out of date :).
It doesn't really make sense to separate these to me; whenever you're serializing or hashing that information, the "mutable" parts should just be discarded.

On 9/30/06, Giovanni Bajo <rasky@develer.com> wrote:
Sorry, no time. But others in this thread clearly agreed with me, so they can guide you. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On 10/1/06, Guido van Rossum <guido@python.org> wrote:
I'd like to write a post mortem for PEP 355. But one important question that haven't been answered is if there is a possibility for a path-like PEP to succeed in the future? If so, does the path-object implementation have to prove itself in the wild before it can be included in Python? From earlier posts it seems like you don't like the concept of path objects, which others have found very interesting. If that is the case, then it would be nice to hear it explicitly. :) -- mvh Björn

BJörn Lindqvist wrote:
Let me take a crack at it - I'm always good for spouting off an arrogant opinion :) Part 1: "Amalgam of Unrelated Functionality" To me, the Path module felt very much like the "swiss army knife" anti-pattern - a whole lot of functions that had little in common other than the fact that paths were involved. More specifically, I think its important to separate the notion of paths as abstract "reference" objects from filesystem manipulators. When I call a function that operates on a path, I want to clearly distinguish between a function that merely does a transformation on the path string, vs. one that actually hits the disk. This goes along with the "principle of least surprise" - it should never be the case that I cause an i/o operation to occur when I wasn't expecting it. For example, a function that computes the parent directory of a path should not IMHO be a sibling of a function which tests for the existence or readability of a file. I tend to think of paths and filesystems as broken down into 3 distinct domains, which are locators, inodes, and files. I realize that not all file systems on all platforms use the term 'inode', and have somewhat different semantics, but they all have some object which fulfills that role. -- A locator is an abstract description of how to "get to" a resource. A file path is a "locator" in exactly the sense that a URL is. Locators need not refer to 'real' resources in order to be valid. A locator to a non-existent resource still maintains a consistent structure, and can be manipulated and transformed without ever actually dereferencing it. A locator does not, however, have any properties or attributes - you cannot tell, for example, the creation date of a file by looking at its locator. -- An inode is a descriptor that points to some actual content. It actually lives on the filesystem, and has attributes (such as creation data, last modified date, permissions, etc.) -- 'Files' are raw content streams - they are the actual bytes that make up the data within the file. Files do not have 'names' or 'dates' directly in of themselves - only the inodes that describe them do. Now, I don't insist that everyone in the world should classify things the way I do - I'm just describing how I see it. Were I to come up with my own path-related APIs, they would most likely be divided into 3 sub-modules corresponding to the 3 subdivisions listed above. I would want to make it clear that when you are operating strictly at the locator level, you aren't touching inodes or files; When you are operating at the inode level, you aren't touching file content. Part 2: Should paths be objects? I should mention that while I appreciate the power of OOP, I am also very much against the kind of OOP-absolutism that has been taught in many schools of software engineering in the last two decades. There are a lot of really good, formal, well-thought-out systems of program organization, and OOP is only one of many. A classic example is relational algebra which forms the basis for relational databased - the basic notion that all operations on tabular data can be "composed" or "chained" in exactly the way that mathematical formula can be. In relational algebra, you can take a view of a view of a view, or a subquery of a query of a view of a table, and so on. Even single, scalar values - such as the count of the number of results of a query - are of the same data type as a 'relation', and can be operated on as such, or fed as input to a subsequent operation. I bring up the example of relational algebra because it applies to paths as well: There is a kind of "path algebra", where an operation on a path results in another path, which can be operated on further. Now, one way to achieve this kind of path algebra is to make paths an object, and to overload the various functions and operators so that they, too, return paths. However, path algebra can be implemented just as easily in a functional style as in an object style. Properly done, a functional design shouldn't be significantly more bulky or wordy than an object design; The fact that the existing legacy API fails this test has more to do with history than any inherent advantages of OOP vs. functional style. (Actually, the OOP approach has a slight advantage in terms of the amount of syntactic sugar available, but that is [a] an artifact of the current Python feature set, and [b] not necessarily a good thing if it leads to gratuitous, Perl-ish cleverness.) As a point of comparison, the Java Path API and the C# .Net Path API have similar capabilities, however the former is object-based whereas the latter is functional and operates on strings. Having used both of them extensively, I find I prefer the C# style, mainly due to the ease of intra-conversion with regular strings - being able to read strings from configuration files, for example, and immediately operate on them without having to convert to path form. I don't find "p.GetParent()" much harder or easier to type than "Path.GetParent( p )"; but I do prefer "Path.GetParent( string )" over "Path( string ).GetParent()". However, this is only a *mild* preference - I could go either way, and wouldn't put up much of a fight about it. (I should not that the Java Path API does *not* follow my scheme of separation between locators and inodes, while the C# API does, which is another reason why I prefer the C# approach.) Part 3: Does this mean that the current API cannot be improved? Certainly not! I think everyone (well, almost) agrees that there is much room for improvement in the current APIs. They certainly need to be refactored and recategorized. But I don't think that the solution is to take all of the path-related functions and drop them into a single class, or even a single module. --- Anyway, I hope that (a) that answers your questions, and (b) isn't too divergent from most people's views about Path. -- Talin

(one additional postscript - One thing I would be interested in is an approach that unifies file paths and URLs so that there is a consistent locator scheme for any resource, whether they be in a filesystem, on a web server, or stored in a zip file.) -- Talin

Talin writes:
+1 But doesn't file:/// do that for files, and couldn't we do something like zipfile:///nantoka.zip#foo/bar/baz.txt? Of course, we'd want to do ziphttp://your.server.net/kantoka.zip#foo/bar/baz.txt, too. That way leads to madness....

Scott Dial writes:
It would make more sense to register protocol handlers to this magical unification of resource manipulation.
I don't think it's that magical, and it's not manipulation, it's location. The question is, register where and on what? For example on my Mac there are some PDFs I want to open in Preview and others in Acrobat. To the extent that I have some classes which are one or the other, I might want to register the handler to a wildcard path object.
But allow me to perform my first channeling of Guido.. YAGNI.
True, but only because when I do need that kind of stuff I'm normally writing Emacs Lisp, not Python. We have a wide variety of functions for manipulating path strings, and they make exactly the distinction between path and inode/content that Talin does (where a path is being manipulated, the function has "filename" in its name, where a file or its metadata is being accessed, the function's name contains "file"). Nonetheless there are two or three places where programmers I respect have chosen to invent path classes to handle hairy special cases. These classes are very useful in those special cases. One place where this gets especially hairy is in the TRAMP package, which allows you to construct "remote paths" involving (for example) logging into host A by ssh, from there to host B by ssh, and finally a "relay download" of the content from host C to the local host by scp. The net effect is that you can specify the path in your "open file" dialog, and Emacs does the rest automatically; the only differences the user sees between that and a local file is the length of the path string and the time it takes to actually access the contents. Once you've done that, that process is embedded into Emacs's notion of the "current directory", so you can list the directory containing the resource, or access siblings, very conveniently. I don't expect to reproduce that functionality in Python personally, but such use cases do exist. Whether a general path class can be invented that doesn't accumulate cruft faster than use cases is another issue.

Scott Dial wrote:
I'm thinking that it was a tactical error on my part to throw in the whole "unified URL / filename namespace" idea, which really has nothing to do with the topic. Lets drop it, or start another topic, and let this thread focus on critiques of the path module, which is probably more relevant at the moment. -- Talin

stephen@xemacs.org wrote:
file:/// does indeed to it, but only the network module understands strings in that format. Ideally, you should be able to pass "file:///..." to a regular "open" function. I wouldn't expect it to be able to understand "http://". But the "file:" protocol should always be supported. In other words, I'm not proposing that the built-in file i/o package suddenly grow an understanding of network schema types. All I am proposing is a unified name space. - Talin

Talin wrote:
Ideally, you should be able to pass "file:///..." to a regular "open" function.
I'm not so sure about that. Consider that "file:///foo.bar" is a valid relative pathname on Unix to a file called "foo.bar" in a directory called "file:". That's not to say there shouldn't be a function available that understands it, but I wouldn't want it built into all functions that take pathnames. -- Greg

Talin wrote:
+1 from me. (for both the fraction I quoted and everything else you said, including the locator/inode/file distinction - although I'd also add that 'symbolic link' and 'directory' exist at a similar level as 'file'). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Nick Coghlan wrote:
I would tend towards classifying directory operations as inode-level operations, that you are working at the "filesystem as graph" level, rather than the "stream of bytes" level. When you iterate over a directory, what you are getting back is effectively inodes (well, directory entries are distinct from inodes in the underlying filesystem, but from Python there's no practical distinction.) If I could draw a UML diagram in ASCII, I would have "inode --> points to --> directory or file" and "directory --> contains * --> inode". That would hopefully make things clearer. Symbolic links, I am not so sure about; In some ways, hard links are easier to classify. --- Having done a path library myself (in C++, for our code base at work), the trickiest part is getting the Windows path manipulations right, and fitting them into a model that allows writing of platform-agnostic code. This is especially vexing when you realize that its often useful to manipulate unix-style paths even when running under Win32 and vice versa. A prime example is that I have a lot of Python code at work that manipulates Perforce client specs files. The path specifications in these files are platform-agnostic, and use forward slashes regardless of the host platform, so "os.path.normpath" doesn't do the right thing for me.
Cheers, Nick.

On Wednesday 25 October 2006 13:16, Talin wrote:
Never heard of it. Its not in the standard library, is it? I don't see it in the table of contents or the index.
This is a documentation bug. :-( I'd thought they were mentioned *somewhere*, but it looks like I'm wrong. os.path is an alias for one of several different real modules; which is selected depends on the platform. I see the following: macpath, ntpath, os3emxpath, riscospath. (ntpath is used for all Windows versions, not just NT.) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>

At 10:16 AM 10/25/2006 -0700, Talin wrote:
posixpath, ntpath, macpath, et al are the platform-specific path manipulation modules that are aliased to os.path. However, each of these modules' string path manipulation functions can be imported and used on any platform. See below: Linux: Python 2.3.5 (#1, Aug 25 2005, 09:17:44) [GCC 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
Windows: Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information.
Note, therefore, that any "path object" system should also allow you to create and manipulate foreign paths. That is, it should have variants for each path type, rather than being locked to the local platform's path strings. Of course, the most common need for this is manipulating posix paths on non-posix platforms, but sometimes one must deal with Windows paths on Unix, too.

Talin wrote:
(Actually, the OOP approach has a slight advantage in terms of the amount of syntactic sugar available,
Even if you don't use any operator overloading, there's still the advantage that an object provides a namespace for its methods. Without that, you either have to use fairly verbose function names or keep qualifying them with a module name. Code that uses the current path functions tends to contain a lot of os.path.this(os.path.that(...)) stuff which is quite tedious to write and read. Another consideration is that having paths be a distinct data type allows for the possibility of file system references that aren't just strings. In Classic MacOS, for example, the definitive way of referencing a file is by a (volRefum, dirID, name) tuple, and textual paths aren't guaranteed to be unique or even to exist.
A compromise might be to have all the "path algebra" operations be methods, and everything else functions which operate on path objects. That would make sense, because the path algebra ought to be a closed set of operations that's tightly coupled to the platform's path semantics. -- Greg

Greg Ewing wrote:
Given the flexibility that Python allows in naming the modules that you import, I'm not sure that this is a valid objection -- you can make the module name as short as you feel comfortable with.
That's true of textual paths in general - i.e. even on unix, textual paths aren't guaranteed to be unique or exist. Its been a while since I used classic MacOS - how do you handle things like configuration files with path names in them?
Personally, this is one of those areas where I am strongly tempted to violate TOOWTDI - I can see use cases where string-based paths would be more convenient and less typing, and other use cases where object-based paths would be more convenient and less typing. If I were designing a path library, I would create a string-based system as the lowest level, and an object based system on top of it (the reason for doing it that was is simply so that people who want to use strings don't have to suffer the cost of creating temporary path objects to do simple things like joins.) Moreover, I would keep the naming conventions of the two systems similar, if at all possible possible - thus, the object methods would have the same (short) names as the functions within the module. So for example: # Import new, refactored module io.path from io import path # Case 1 using strings path1 = path.join( "/Libraries/Frameworks", "Python.Framework" ) parent = path.parent( path1 ) # Case 2 using objects pathobj = path.Path( "/Libraries/Frameworks" ) pathobj += "Python.Framework" parent = pathobj.parent() Let me riff on this just a bit more - don't take this all too seriously though: Refactored organization of path-related modules (under a new name so as not to conflict with existing modules): io.path -- path manipulations io.dir -- directory functions, including dirwalk io.fs -- dealing with filesystem objects (inodes, symlinks, etc.) io.file -- file read / write streams # Import directory module import io.dir # String based API for entry in io.dir.listdir( "/Library/Frameworks" ): print entry # Entry is a string # Object based API dir = io.dir.Directory( "/Library/Frameworks" ) for entry in dir: # Iteration protocol on dir object print entry # entry is an obj, but __str__() returns path text # Dealing with various filesystems: pass in a format parameter dir = io.dir.Directory( "/Library/Frameworks" ) print entry.path( format="NT" ) # entry printed in NT format # Or you can just use a format specifier for PEP 3101 string format: print "Path in local system format is {0}".format( entry ) print "Path in NT format is {0:NT}".format( entry ) print "Path in OS X format is {0:OSX}".format( entry ) Anyway, off the top of my head, that's what a refactored path API would look like if I were doing it :) (Yes, the names are bad, can't think of better ATM.) -- Talin

On Oct 25, 2006, at 10:48 PM, Talin wrote:
You aren't supposed to use paths at all. You're supposed to use an Alias whenever you're doing long term storage of a reference to a file. This allows the user to move the file around on the disk without breaking the reference, which is nice. The alias is an opaque datastructure which contains a bunch of redundant information used to locate the file. In particular, both pathname and (volumeId, dirId, name), as well as some other stuff like file size, etc. to help do fuzzy matching if the original file can't be found via the obvious locators. And for files on a file server, it also contains information on how to reconnect to the server if necessary. Much of the alias infrastructure carries over into OSX, although the strictures against using paths have been somewhat watered down. At least in OSX, you don't have the issue of the user renaming the boot volume and thus breaking every path someone ill-advisedly stored (since volume name was part of the path). For an example of aliases in OSX, open a file in TextEdit, see that it gets into the "recent items" menu. Now, move it somewhere else and rename it, and notice that it's still accessible from the menu. Seperately, try deleting the file and renaming another to the same name. Notice that it also succeeds in referencing this new file. Hm, how's this related to python? I'm not quite sure. :) James

Talin wrote:
That's true of textual paths in general - i.e. even on unix, textual paths aren't guaranteed to be unique or exist.
What I mean is that it's possible for two different files to have the same pathname (since you can mount two volumes with identical names at the same time, or for a file to exist on disk yet not be accessible via any pathname (because it would exceed 255 characters). I'm not aware of any analogous situations in unix.
Its been a while since I used classic MacOS - how do you handle things like configuration files with path names in them?
True native classic MacOS software generally doesn't use pathnames. Things like textual config files are really a foreign concept to it. If you wanted to store config info, you'd probably store an alias, which points at the moral equivalent of the files inode number, and use a GUI for editing it. However all this is probably not very relevant now, since as far as I know, classic MacOS is no longer supported in current Python versions. I'm just pointing out that the flexibility would be there if any similarly offbeat platform needed to be supported in the future.
I don't think that expressing one platform's pathnames in the format of another is something you can do in general, e.g. going from Windows to Unix, what do you do with the drive letter? You can only really do it if you have some sort of network file system connection, and then you need more information than just the path in order to do the translation. -- Greg

Greg Ewing wrote:
I'm not sure that PEP 355 included any such support - IIRC, the path object was a subclass of string. That isn't, however, a defense against what you are saying - just because neither the current system or the proposed improvement support the kinds of file references you are speaking of, doesn't mean it shouldn't be done. However, this does kind of suck for a cross-platform scripting language like Python. It means that any cross-platform app which requires access to multiple data files that contain inter-file references essentially has to implement its own virtual file system. (Python module imports being a case in point.) One of the things that I really love about Python programming is that I can sit down and start hacking on a new project without first having to go through an agonizing political decision about what platforms I should support. It used to be that I would spend hours ruminating over things like "Well...if I want any market share at all, I really should implement this as Windows program...but on the other hand, I won't enjoy writing it nearly as much." Then comes along Python and removes all of that bothersome hacker-angst. Because of this, I am naturally disinclined to incorporate into my programs any concept which doesn't translate to other platforms. I don't mind writing some platform-specific code, as long as it doesn't take over my program. It seems that any Python program that manipulated paths would have to be radically different in the environment that you describe. How about this: In my ontology of path APIs given earlier, I would tend to put the MacOS file reference in the category of "file locator schemes other than paths". In other words, what you are describing isn't IMHO a path at all, but it is like a path in that it describes how to get to a file. (Its almost like an inode or dirent in some ways.) An alternative approach is to try and come up with an encoding scheme that allows you to represent all of that platform-specific semantics in a string. This leaves you with the unhappy choice of "inventing" a new path syntax for an old platform. however.
Yeah, probably not. See, I told you not to take it too seriously! But I do feel that its important to be able to manipulate posix-style path syntax on non-posix platfosm, given how many cross-platform applications there are that have a cross-platform path syntax. In my own work, I find that drive letters are never explicitly specified in config files. Any application such as a parser, template generator, or resource manager (in other words, any application whose data files are routinely checked in to the source control system or shared across a network) tend to 'see' only relative paths in their input files, and embedding absolute paths is considered an error on the user's part. Of course, those same apps *do* internally convert all those relative paths to absolute, so that they can be compared and resolved with respect to some common base. Then again, in my opinion, the only *really* absolute paths are fully-qualified URLs. So there. :)

Talin wrote:
It seems that any Python program that manipulated paths would have to be radically different in the environment that you describe.
I can sympathise with that. The problem is really inherent in the nature of the platforms -- it's just not possible to do everything in a native classic MacOS way and be cross-platform at the same time. There has to be a compromise somewhere. With classic MacOS the compromise was usually to use pathnames and to heck with the consequences. You could get away with it most of the time.
Yes, that's true. Calling it a "path" would be something of a historical misnomer.
Yes, I thought of that, too. That's what you would have to do under the current scheme if you ever encountered a platform which truly had no textual representation of file locations. But realistically, it seems unlikely that such a platform will be invented in the foreseeable future (even classic MacOS *had* a notion of paths, even if it wasn't the preferred representation). So all this is probably YAGNI. -- Greg

BJörn Lindqvist wrote:
So...how's that post mortem coming along? Did you get a sufficient answer to your questions? And the more interesting question is, will the effort to reform Python's path functionality continue? From reading all the responses to your post, I feel that the community is on the whole supportive of the idea of refactoring os.path and friends, but they prefer a different approach; And several of the responses sketch out some suggestions for what that approach might be. So what happens next? -- Talin

On 10/28/06, Talin <talin@acm.org> wrote:
Yes and no. All posts have very exhaustively explained why the implementation in PEP 355 is far from optimal. And I can see why it is. However, what I am uncertain of is Guido's opinion on the background and motivation of the PEP: "Many have felt that the API for manipulating file paths as offered in the os.path module is inadequate." "Currently, Python has a large number of different functions scattered over half a dozen modules for handling paths. This makes it hard for newbies and experienced developers to to choose the right method." IMHO, the current API is very messy. But when it comes to PEPs, it is mostly Guido's opinion that counts. :) Unless he sees a problem with the current situation, then there is no point in writing more PEPs.
And the more interesting question is, will the effort to reform Python's path functionality continue?
I certainly hope so. But maybe it is better to target Python 3000, or maybe the Python devs already have ideas for how they want the path APIs to look like?
So what happens next?
I really hope that Guido will give his input when he has more time. Mvh Björn

BJörn Lindqvist wrote:
I think targeting Py3K is a good idea. The whole purpose of Py3K is to "clean up the messes" of past decisions, and to that end, a certain amount of backwards-compatibility breakage will be allowed (although if that can be avoided, so much the better.) And to the second point, having been following the Py3K list, I don't anyone has expressed any preconceived notions of how they want things to look (well, except I know I do, but I'm not a core dev :) :).
So what happens next?
I really hope that Guido will give his input when he has more time.
First bit of advice is, don't hold your breath. Second bit of advice is, if you really do want Guido's feedback (or the core python devs), start my creating a (short) list of the outstanding points of controversy to be resolved. Once those issues have been decided, then proceed to the next stage, building consensus by increments. Basically, anything that requires Guido to read more than a page of material isn't going to get done quickly. At least, in my experience :)
Mvh Björn

On 9/30/06, Giovanni Bajo <rasky@develer.com> wrote:
Here are my guesses. I believe Guido rejected this PEP for a lot of reasons. By the way, what I'm about to do is known as "channeling Guido (badly)" and I'm pretty sure it annoys him. Sorry, Guido. Please don't treat the following as authoritative; I have never met Guido and obviously I cannot speak for him. - I don't think Guido ever saw much benefit from "path objects". That is, the Motivation was not compelling. I think the main motivation is to eliminate some clutter and add a handful of useful methods to the stdlib, so it's easy to see how this could be the case. - Guido just flat-out didn't like the looks of the PEP. Too much weirdness. (path.py contains more weirdness, including some stuff Guido particularly disliked, and I think it's fair to say that PEP355 suffered somewhat by association.) - Any proposal to add a Second Way To Do It has to meet a very high standard. PEP355 was too big to be considered an incremental change. Yet it didn't even attempt to fix all the perceived problems with the existing APIs. A more thorough job would have had a better chance. - Nobody liked the API design--too many methods. - Now we're hearing rumors of better ideas out there, which comes as a relief. I suspect any one of these could have scuttled the proposal. -j
participants (17)
-
BJörn Lindqvist
-
Fred L. Drake, Jr.
-
Fredrik Lundh
-
Georg Brandl
-
Giovanni Bajo
-
glyph@divmod.com
-
Greg Ewing
-
Guido van Rossum
-
James Y Knight
-
Jason Orendorff
-
Luis P Caamano
-
Michael Hudson
-
Nick Coghlan
-
Phillip J. Eby
-
Scott Dial
-
stephen@xemacs.org
-
Talin