New subject: Path object design

1 Nov 2006

      On 01:46 am, sluggoster@gmail.com wrote:
...
On 11/1/06, glyph@divmod.com <glyph@divmod.com> wrote:
...
This is ironic coming from one of Python's celebrity geniuses.  "We
made this class but we don't know how it works."  Actually, it's
downright alarming coming from someone who knows Twisted inside and
out yet still can't make sense of path patform oddities.
Man, it is going to be hard being ironically self-deprecating if people keep going around calling me a "celebrity genius".  My ego doesn't need any help, you know? :)

In some sense I was being serious; part of the point of abstraction is embedding some of your knowledge in your code so you don't have to keep it around in your brain all the time.  I'm sure that my analysis of path-based problems wasn't exhaustive because I don't really use os.path for path manipulation.  I use static.File and it _works_, I only remember these os.path flaws from the process of writing it, not daily use.
...
...
* This is confusing as heck:
...
...
...
os.path.join("hello", "/world")
   '/world'
That's in the documentation.  I'm not sure it's "wrong".  What should
it do in this situation?  Pretend the slash isn't there?
You can document anything.  That doesn't really make it a good idea.

The point I was trying to make wasn't really that os.path is *wrong*.  Far from it, in fact, it defines some useful operations and they are basically always correct.  I didn't even say "wrong", I said "confusing".  FilePath is implemented strictly in terms of os.path because it _does_ do the right thing with its inputs.  The question is, how hard is it to remember what its inputs should be?
...
...
...
...
...
os.path.join("hello", "slash/world")
   'hello/slash/world'
That has always been a loophole in the function, and many programs
depend on it.
If you ever think I'm suggesting breaking something in Python, you're misinterpreting me ;).  I am as cagey as they come about this.  No matter what else happens, the behavior of os.path should not really change.
...
The user didn't call normpath, so should we normalize it anyway?
That's really the main point here.

What is a path that hasn't been "normalized"?  Is it a path at all, or is it some random garbage with slashes (or maybe other things) in it?  os.path performs correct path algebra on correct inputs, and it's correct (as far as one can be correct) on inputs that have weird junk in them.

In the strings-and-functions model of paths, this all makes perfect sense, and there's no particular sensibility associated with defining ideas like "equivalency" for paths, unless that's yet another function you pass some strings to.  I definitely prefer this:
    path1 == path2
to this:
    os.path.abspath(pathstr1) == os.path.abspath(pathstr2)
though.

You'll notice I used abspath instead of normpath.  As a side note, I've found interpreting relative paths as always relative to the current directory is a bad idea.  You can see this when you have a daemon that daemonizes and then opens files: the user thinks they're specifying relative paths from wherever they were when they ran the program, the program thinks they're relative paths from /var/run/whatever.  Relative paths, if they should exist at all, should have to be explicitly linked as relative to something *else* (e.g. made absolute) before they can be used.  I think that sequences of strings might be sufficient though.
...
Good point, but exactly what functionality do you want to see for zip
files and URLs?  Just pathname manipulation?  Or the ability to see
whether a file exists and extract it, copy it, etc?
The latter.  See http://twistedmatrix.com/trac/browser/trunk/twisted/python/zippath.py

This is still _really_ raw functionality though.  I can't claim that it has the same "it's been used in real code" endorsement as the rest of the FilePath stuff I've been talking about.  I've never even tried to hook this up to a Twisted webserver, and I've only used it in one environment.
...
...
* you have to care about unicode sometimes.
...
This is a Python-wide problem.
I completely agree, and this isn't the thread to try to solve it.  The absence of a path object, however, and the path module's reliance on strings, exacerbates the problem.  The fact that FilePath doesn't deal with this either, however, is a fairly good indication that the problem is deeper than that.
...
...
* the documentation really can't emphasize enough how bad using
'os.path.exists/isfile/isdir', and then assuming the file continues to exist
when it is a contended resource, is.  It can be handy, but it is _always_ a
race condition.
What else can you do?  It's either os.path.exists()/os.remove() or "do
it anyway and catch the exception".  And sometimes you have to check
the filetype in order to determine *what* to do.
You have to catch the exception anyway in many cases.  I probably shouldn't have mentioned it though, it's starting to get a bit far afield of even this ridiculously far-ranging discussion.  A more accurate criticism might be that "the absence of a file locking system in the stdlib means that there are lots outside it, and many are broken".  Different issue though; if it's related, it's a different method that can be added later.

Re: [Python-Dev] Path object design

glyph＠divmod.com

Mike Orr

Greg Ewing

tags

participants (3)