Re: [Python-Dev] Path object design
On 01:46 am, sluggoster@gmail.com wrote:
On 11/1/06, glyph@divmod.com <glyph@divmod.com> wrote:
This is ironic coming from one of Python's celebrity geniuses. "We made this class but we don't know how it works." Actually, it's downright alarming coming from someone who knows Twisted inside and out yet still can't make sense of path patform oddities.
Man, it is going to be hard being ironically self-deprecating if people keep going around calling me a "celebrity genius". My ego doesn't need any help, you know? :) In some sense I was being serious; part of the point of abstraction is embedding some of your knowledge in your code so you don't have to keep it around in your brain all the time. I'm sure that my analysis of path-based problems wasn't exhaustive because I don't really use os.path for path manipulation. I use static.File and it _works_, I only remember these os.path flaws from the process of writing it, not daily use.
* This is confusing as heck:
os.path.join("hello", "/world") '/world'
That's in the documentation. I'm not sure it's "wrong". What should it do in this situation? Pretend the slash isn't there?
You can document anything. That doesn't really make it a good idea. The point I was trying to make wasn't really that os.path is *wrong*. Far from it, in fact, it defines some useful operations and they are basically always correct. I didn't even say "wrong", I said "confusing". FilePath is implemented strictly in terms of os.path because it _does_ do the right thing with its inputs. The question is, how hard is it to remember what its inputs should be?
os.path.join("hello", "slash/world") 'hello/slash/world'
That has always been a loophole in the function, and many programs depend on it.
If you ever think I'm suggesting breaking something in Python, you're misinterpreting me ;). I am as cagey as they come about this. No matter what else happens, the behavior of os.path should not really change.
The user didn't call normpath, so should we normalize it anyway?
That's really the main point here. What is a path that hasn't been "normalized"? Is it a path at all, or is it some random garbage with slashes (or maybe other things) in it? os.path performs correct path algebra on correct inputs, and it's correct (as far as one can be correct) on inputs that have weird junk in them. In the strings-and-functions model of paths, this all makes perfect sense, and there's no particular sensibility associated with defining ideas like "equivalency" for paths, unless that's yet another function you pass some strings to. I definitely prefer this: path1 == path2 to this: os.path.abspath(pathstr1) == os.path.abspath(pathstr2) though. You'll notice I used abspath instead of normpath. As a side note, I've found interpreting relative paths as always relative to the current directory is a bad idea. You can see this when you have a daemon that daemonizes and then opens files: the user thinks they're specifying relative paths from wherever they were when they ran the program, the program thinks they're relative paths from /var/run/whatever. Relative paths, if they should exist at all, should have to be explicitly linked as relative to something *else* (e.g. made absolute) before they can be used. I think that sequences of strings might be sufficient though.
Good point, but exactly what functionality do you want to see for zip files and URLs? Just pathname manipulation? Or the ability to see whether a file exists and extract it, copy it, etc?
The latter. See http://twistedmatrix.com/trac/browser/trunk/twisted/python/zippath.py This is still _really_ raw functionality though. I can't claim that it has the same "it's been used in real code" endorsement as the rest of the FilePath stuff I've been talking about. I've never even tried to hook this up to a Twisted webserver, and I've only used it in one environment.
* you have to care about unicode sometimes.
This is a Python-wide problem.
I completely agree, and this isn't the thread to try to solve it. The absence of a path object, however, and the path module's reliance on strings, exacerbates the problem. The fact that FilePath doesn't deal with this either, however, is a fairly good indication that the problem is deeper than that.
* the documentation really can't emphasize enough how bad using 'os.path.exists/isfile/isdir', and then assuming the file continues to exist when it is a contended resource, is. It can be handy, but it is _always_ a race condition.
What else can you do? It's either os.path.exists()/os.remove() or "do it anyway and catch the exception". And sometimes you have to check the filetype in order to determine *what* to do.
You have to catch the exception anyway in many cases. I probably shouldn't have mentioned it though, it's starting to get a bit far afield of even this ridiculously far-ranging discussion. A more accurate criticism might be that "the absence of a file locking system in the stdlib means that there are lots outside it, and many are broken". Different issue though; if it's related, it's a different method that can be added later.
On 11/1/06, glyph@divmod.com <glyph@divmod.com> wrote:
On 01:46 am, sluggoster@gmail.com wrote:
On 11/1/06, glyph@divmod.com <glyph@divmod.com> wrote:
This is ironic coming from one of Python's celebrity geniuses. "We made this class but we don't know how it works." Actually, it's downright alarming coming from someone who knows Twisted inside and out yet still can't make sense of path patform oddities.
Man, it is going to be hard being ironically self-deprecating if people keep going around calling me a "celebrity genius". My ego doesn't need any help, you know? :)
I respect Twisted in the same way I respect a loaded gun. It's powerful, but approach with caution.
If you ever think I'm suggesting breaking something in Python, you're misinterpreting me ;). I am as cagey as they come about this. No matter what else happens, the behavior of os.path should not really change.
The point is, what *should* a join-like method do in a future improved path module? os.path.join should not change because too many programs depend on its current behavior, in ways we can't necessarily predict. But a new function/method is not bound by these constraints, as long as the boundary cases are well documented. All the os.path and file-related os/shutil functions need to be reexamined in this context. Maybe the existing behavior is best, maybe we'll keep it even if it's sub-optimal, but we should document why we're making these choices.
The user didn't call normpath, so should we normalize it anyway?
That's really the main point here.
What is a path that hasn't been "normalized"? Is it a path at all, or is it some random garbage with slashes (or maybe other things) in it? os.path performs correct path algebra on correct inputs, and it's correct (as far as one can be correct) on inputs that have weird junk in them.
I'm tempted to say Path("/a/b").join("c", "d") should do the same thing your .child method does, but allow multiple levels in one step. But on the other hand, there will always be people with prebuilt "path/fragments" to join to other fragments, and I'm not sure we should force them to split the fragment just to rejoin it again. Maybe we need a .join_unsafe method for this, haha. -- Mike Orr <sluggoster@gmail.com>
glyph@divmod.com wrote:
Relative paths, if they should exist at all, should have to be explicitly linked as relative to something *else* (e.g. made absolute) before they can be used.
If paths were opaque objects, this could be enforced by not having any way of constructing a path that wasn't rooted in some existing absolute path. -- Greg
participants (3)
-
glyph@divmod.com
-
Greg Ewing
-
Mike Orr